Upload
alegria-martinez
View
15
Download
0
Embed Size (px)
DESCRIPTION
New approaches to the catalog. T. Hickey http://errol.oclc.org/laf/n82-54463.html Svensk Biblioteksförening 2005 October 28. OCLC. Founded 1967 Nonprofit membership organization > 53,000 libraries 96 countries ~1,000 employees Cataloging Interlibrary Loan Preservation - PowerPoint PPT Presentation
Citation preview
New approaches to the catalog
T. Hickeyhttp://errol.oclc.org/laf/n82-54463.html
Svensk Biblioteksförening 2005 October 28
OCLC
Founded 1967 Nonprofit membership
organization > 53,000 libraries 96 countries ~1,000 employees
Cataloging Interlibrary Loan Preservation Dewey Decimal Classification netLibrary FirstSearch
OCLC Research
Research for both• OCLC services• Membership
Metadata management Knowledge organization Content management Interoperability Systems & interaction design ~30 employees
What do users want?
The right information– with minimum effort
How to give them what they want
Catch them where they are Increase our data Improve our data Make the data work harder Interconnect with other systems Do all this efficiently
What has changed
Computers and telecommunications• User expectations• Digital materials• Remoteness of our users• Huge amounts of bandwidth, storage
The competition
Online booksellers• Reviews• Tables of contents• Excerpts• Inside-the-book searching
Web search engines• Speed• Full-text searching• Global coverage (of web resources)• Good enough
Ourselves• Electronic journals
Current projects (my group)
Live search Registries, PURLs Dewey browser Harvesting, electronic theses VIAF, LAF SRU/W, OpenURLs, OAI FRBR, xISBN Beowulf cluster Map-reduce Text searching Batch loading
Open WorldCat WorldCat Wiki Publisher Names MXG
Other Research Projects
FictionFinder, Curiouser Schema Transformation Terminology Services Digital Preservation Collection Analysis Dublin Core FAST User Studies Data mining
Also: http://www.oclc.org/research/researchworks/
Catch them where they are
Google, Yahoo, etc.• Open WorldCat• Open URL• OAI-PMH
Creation too• WCat Wiki• Tags?
OpenWorldCat
Editions
OpenURL
OpenURL registry• Supports version 1.0• Also registry of OpenURL servers• Used for WikiD
WorldCat ‘Wiki’
Opening up WorldCat to user annotations• Reviews• Notes• Tables of contents• Cover art?• Book lists?
Based on WikiD software• Full Wiki
• Many features off for WorldCat• Uses OpenURL 1.0 protocol internally• Allows collections of pages of arbitrary XML schemas• Tools for the creation of simple collections
Doesn’t look like a Wiki
Reviews
Tags?
Folksonomies? User-generated key words We’ve been here before
• Is it different?• Is there another direction?
Opening Dewey
More data
Harvesting• OAI-PMH• ETDs
Batch load• 60 million records• 3 million new manifestations
Other• Cover art• Reviews• WC
Better data and organization
VIAF FRBR Authority files in general
• LAF• Publisher names• Genre• FAST
Registries• PURLs• Generalized solution?Get them nearer to creation
FRBR
Work-set algorithm• Keys based on author/title• Authority files• Auxiliary authority files• xISBN
Used for• xISBN• Open WorldCat• FirstSearch (coming)• Collection analysis (coming)• Research
Authority Files
LAF• http://errol.oclc.org/laf/n82-54463.html
Publisher names• Not normally controlled• Looking for variations with ISBN prefixes• Also worked with dissertations
VIAF
Merge national-level files Library of Congress (NACO) and Die Deutsche Bibliothek
• Bibliographic records analyzed• 15% would be erroneous based just on names
Basic matching now completed• 435,000 matching names• < 1% mismatched
Working on• Public interface• OAI harvesting• Persistent identifiers
Maj
Registries
Show relationships between metadata Often associated with an identifier General solution? Examples
• Authority files• WorldCat• PURLs
PURLs
Persistent URLs• Map one URL to another• http://purl.org/hickey/outgoing ->
• http://outgoing.typepad.com/• 500,000+ PURLs• 111 million resolutions
Port to Wiki’D platform?• http://www.oclc.org/research/projects/wikid/
String of PURL servers?• Use OAI-PMH for synchronization• Spread responsibility
Generalized solution?
More connectivity
Open URL RSS feeds OpenSearch, SRU/W OAI-PMH
OpenURL
Developed to address the ‘appropriate copy’ problem Transitioning to OpenURL 1.0 OpenURL resolver
• Accepts requests specifying• Resource• Services
Generalized syntax• Specifying a resource• Services to be performed
Metadata elements specified in registry• http://purl.org/openurl/
SRU
Simplified version of Z39.50• Web based• SRW – SOAP• SRU – URL
Even simpler?• OpenSearch• No search syntax• Looking for common ground
MXG• Metasearch XML Gateway• Simplifies metasearcher’s lives
OAI-PMH
Method of harvesting metadata• More generally, a way of synchronizing databases
No real restriction to metadata Becomes a repository protocol
• Identifiers• Timestamps
Layered implementation• OAI• SRU• Pears
Efficient processing
Beowulf cluster Map reduce Text searching
Beowulf Cluster 24 nodes
• 2 processors, 4 gigabytes of RAM, 120 gigabytes disk• Gigabit network
Use it for• FRBR processing• Text indexing• Text searching
~ 30-fold speed up on many tasks• 1 year ⇒ 2 weeks• 1 week ⇒ 1 day• 1 day ⇒ 1 hour• 1 hour ⇒ 2 minutes
Extremely cheap processing
Map reduce
Pioneered by Google• Petabytes of data on thousands of nodes
Adapted to our cluster• Tens of gigabytes of data on dozens of nodes
Simple functional programming paradigm Allows batch processing across cluster
Text Searching
Spread database across cluster Two levels of aggregation
• 3 servers/node• 24-way aggregation• Aggregators run across cluster
SRU used• HTTP based• SRW (SOAP) slowed it down
Open source software
Better interfaces
More interactive• Live search• Dewey Browser
Better connected
Post-coordination of Services
Systems that expose low level services Higher level coordination of those services Loosely coupled services Examples from OCLC
• Validation service• RSS feeds• SRU• OpenURL, OAI-PMH• xISBN• DDC Browser built this way
• Very different interfaces have been built
DDC Browser XML <?xml version="1.0" encoding="utf-8"?><?xml-stylesheet
type="text/xsl" href="/ddcbrowser/xsl/wcat.xsl" ?> <cells>
• <language>swe</language>• <cell ddc="330" count="23" /> • <cell ddc="331" count="28" /> • <cell ddc="332" count="5" /> • <cell ddc="333" count="7" /> • <cell ddc="334" count="2" /> • <cell ddc="335" count="1" /> • <cell ddc="336" count="3" /> • <cell ddc="337" count="2" /> • <cell ddc="338" count="26" /> • <cell ddc="339" count="5" />
</cells>
Do We Need It?
Just have Google harvest everything• Our experience with Google• Fielded searching• Reliable searching
Possibility of user-supplied metadata Cost of good metadata Cost of non-existent metadata
Conclusions
Shift to remote users Online availability – trend towards centralization More flexibility in implementations
Patrons are better served Less emphasis on physical collections
Thank you
T. Hickeyhttp://errol.oclc.org/laf/n82-54463.html
Swedish Library Association2005 October 28