Upload
brittney-skinner
View
220
Download
0
Embed Size (px)
Citation preview
OCoLR 20041025 #53928015 OCLCR
Making data work harder
Lorcan DempseyOCLC
OVGTSL 2005 ConferenceNewark, May 11-13
OCoLR 20041025 #53928015 OCLCR
Overview
Some context Looking at data in action
• OpenWorldCat• FRBR• Data mining
OCoLR 20041025 #53928015 OCLCR
Context: value
Amazoogle: what should we be doing which fits into a world that they occupy. Where do we provide unique value.
ROI: libraries invest in data but do not extract as much value as they might from it. Unless we release more value, then the argument for this investment becomes weaker.
User: how do we co-create value with users. What opportunities are there for mixing catalog data and user contributed data?
Management intelligence: how do we use data better to inform management decisions?
OCoLR 20041025 #53928015 OCLCR
Context: consequences
The role of the catalog? The role of structured data? The role of the library?
OCoLR 20041025 #53928015 OCLCR
Data
Open WorldCat FRBR WorldCat Wiki Management intelligence
OCoLR 20041025 #53928015 OCLCR
FRBR
‘Interim FRBR’ in OWC FRBR in research projects
• FictionFinder• Curioser• xISBN• Algorithm• Top 1000
FRBR in FirstSearch – late this year
OCoLR 20041025 #53928015 OCLCR
Top Sets for Fiction (Records)
Record Keys1,296 defoe, daniel\1661 1731/robinson crusoe
1,267 carroll, lewis\1832 1898/alices adventures in wonderland
971 cervantes saavedra, miguel de\1547 1616/don quixote
828 stevenson, robert louis\1850 1894/treasure island
689 twain, mark\1835 1910/adventures of huckleberry finn
624 twain, mark\1835 1910/adventures of tom sawyer
618 swift, jonathan\1667 1745/gullivers travels
Top Sets for Fiction (Holdings)
Holding Keys
29,043 twain, mark\1835 1910/adventures of huckleberry finn
26,088 carroll, lewis\1832 1898/alices adventures in wonderland
20,843 twain, mark\1835 1910/adventures of tom sawyer
19,410 defoe, daniel\1661 1731/robinson crusoe
18,566 cervantes saavedra, miguel de\1547 1616/don quixote
18,492 stevenson, robert louis\1850 1894/treasure island
18,123 dickens, charles\1812 1870/christmas carol
OCoLR 20041025 #53928015 OCLCR
Taking FRBR onto the open web
Curio(u)ser
OCoLR 20041025 #53928015 OCLCR
MetaWiki
WIKI – web pages metaWIKI – data
Capture user input in structured ways
OCoLR 20041025 #53928015 OCLCR
Extending Wiki’s utility
Wiki: supported markup:
• wikitext page editing:
• a single text block
searches:• full text
searching collections
managed:• one per wiki
MetaWiki: supported markup:
• wikitext• structured data (e.g.,
MARC, METS, DC…) page editing:
• a single text block, or,• field level
searches:• full text searching• fielded searching
collections managed:• one/multiple per OaiWiki
Lorcan:note that this is a work in progress
OCoLR 20041025 #53928015 OCLCR
Management intelligence
So we have all this data – what can it tell us?
Several projects underway: only some discussed here
OCoLR 20041025 #53928015 OCLCR
Making Data Work Harder
Activities “shed” data:• Cataloging bibliographic information• Web site traffic transaction logs• Reference queries search term lists
Need to mine this data for intelligence that creates value for libraries and users
OCLC Research undertaking a number of data-mining projects aimed at:• Knowing more about the characteristics of library
collections• Creating interesting and useful data displays• Generating intelligence to support library decision-making
OCoLR 20041025 #53928015 OCLCR
Data mining
OCLC has a new collection analysis service
Some research projects looking at systemic questions described here.
OCoLR 20041025 #53928015 OCLCR
Looking at Library Print Book Collections … Systematically
32 million print books, representing26 million distinct works
Half of print books published after1977; more than 80% still “in copyright”
Rareness is common! Only a third of print books have more than five holdings; half have two or less
OCLC/Ithaka collaboration: Use WorldCat to characterize the “system-wide” print book collection – i.e., aggregate print book holdings in WorldCat
Intelligence of this kind can help establish digitization prioritiesand inform preservation planning
More information: http://www.oclc.org/research/presentations/lavoie/cni2005.ppt
Only about 120,000 works had bothprint book and e-book manifestations
OCoLR 20041025 #53928015 OCLCR
The Implications of GooglePrint …
Potentially covers about one third of print books in WorldCat
~60 percent of “GooglePrint” books held by only one of the Google 5
Less than 5 percent held by all of the Google 5
~20 percent of “GooglePrint books” out of copyright
Paper forthcoming …
OCoLR 20041025 #53928015 OCLCR
Know Your Audience!
Implies: we can infer materials’ audience level from holdings patterns, which in turn can support:• Collection management• Readers’ advisory services• Reference services• Information retrieval
Holdings represent selection decisions by librarians … implies there are about 1 billion individual selection decisions in the WorldCat holdings file
Selections are made to serve the interests of a library’s target community …• Associate target community (audience level) to particular library profiles - e.g., ARL, non-ARL academic, public, K-12 school …
Paper forthcoming!
?
OCoLR 20041025 #53928015 OCLCR
“Last Copy”: Identifying At-Risk Materials
~23 million WorldCat records have only a single holding attached
Libraries need to know what portions of their collections are:Rare … Rare and valuable …“Last copy” (artifact and/or content)
Identification of rare materials essential intelligence in support of storage, digitization, and preservation decision-making
Data-mining study of Vanderbilt holdings in WorldCat:• Identified 23,000 items held uniquely by Vanderbilt
• ~60 % are print books• ~60 % produced prior to 1950; ~25 % produced after 1970
Paper forthcoming!
OCoLR 20041025 #53928015 OCLCR
Thank you!
OCLC Research:
http://www.oclc.org/research/
Lorcan:
http://orweblog.oclc.org/