34
OCoLR 20041025 #53928015 OCLCR Making data work harder Lorcan Dempsey OCLC OVGTSL 2005 Conference Newark, May 11-13

OCoLR 20041025 #53928015 OCLCR Making data work harder Lorcan Dempsey OCLC OVGTSL 2005 Conference Newark, May 11-13

Embed Size (px)

Citation preview

Page 1: OCoLR 20041025 #53928015 OCLCR Making data work harder Lorcan Dempsey OCLC OVGTSL 2005 Conference Newark, May 11-13

OCoLR 20041025 #53928015 OCLCR

Making data work harder

Lorcan DempseyOCLC

OVGTSL 2005 ConferenceNewark, May 11-13

Page 2: OCoLR 20041025 #53928015 OCLCR Making data work harder Lorcan Dempsey OCLC OVGTSL 2005 Conference Newark, May 11-13

OCoLR 20041025 #53928015 OCLCR

Overview

Some context Looking at data in action

• OpenWorldCat• FRBR• Data mining

Page 3: OCoLR 20041025 #53928015 OCLCR Making data work harder Lorcan Dempsey OCLC OVGTSL 2005 Conference Newark, May 11-13

OCoLR 20041025 #53928015 OCLCR

Context: value

Amazoogle: what should we be doing which fits into a world that they occupy. Where do we provide unique value.

ROI: libraries invest in data but do not extract as much value as they might from it. Unless we release more value, then the argument for this investment becomes weaker.

User: how do we co-create value with users. What opportunities are there for mixing catalog data and user contributed data?

Management intelligence: how do we use data better to inform management decisions?

Page 4: OCoLR 20041025 #53928015 OCLCR Making data work harder Lorcan Dempsey OCLC OVGTSL 2005 Conference Newark, May 11-13

OCoLR 20041025 #53928015 OCLCR

Context: consequences

The role of the catalog? The role of structured data? The role of the library?

Page 5: OCoLR 20041025 #53928015 OCLCR Making data work harder Lorcan Dempsey OCLC OVGTSL 2005 Conference Newark, May 11-13
Page 6: OCoLR 20041025 #53928015 OCLCR Making data work harder Lorcan Dempsey OCLC OVGTSL 2005 Conference Newark, May 11-13

OCoLR 20041025 #53928015 OCLCR

Data

Open WorldCat FRBR WorldCat Wiki Management intelligence

Page 7: OCoLR 20041025 #53928015 OCLCR Making data work harder Lorcan Dempsey OCLC OVGTSL 2005 Conference Newark, May 11-13
Page 8: OCoLR 20041025 #53928015 OCLCR Making data work harder Lorcan Dempsey OCLC OVGTSL 2005 Conference Newark, May 11-13
Page 9: OCoLR 20041025 #53928015 OCLCR Making data work harder Lorcan Dempsey OCLC OVGTSL 2005 Conference Newark, May 11-13
Page 10: OCoLR 20041025 #53928015 OCLCR Making data work harder Lorcan Dempsey OCLC OVGTSL 2005 Conference Newark, May 11-13
Page 11: OCoLR 20041025 #53928015 OCLCR Making data work harder Lorcan Dempsey OCLC OVGTSL 2005 Conference Newark, May 11-13

OCoLR 20041025 #53928015 OCLCR

FRBR

‘Interim FRBR’ in OWC FRBR in research projects

• FictionFinder• Curioser• xISBN• Algorithm• Top 1000

FRBR in FirstSearch – late this year

Page 12: OCoLR 20041025 #53928015 OCLCR Making data work harder Lorcan Dempsey OCLC OVGTSL 2005 Conference Newark, May 11-13
Page 13: OCoLR 20041025 #53928015 OCLCR Making data work harder Lorcan Dempsey OCLC OVGTSL 2005 Conference Newark, May 11-13
Page 14: OCoLR 20041025 #53928015 OCLCR Making data work harder Lorcan Dempsey OCLC OVGTSL 2005 Conference Newark, May 11-13
Page 15: OCoLR 20041025 #53928015 OCLCR Making data work harder Lorcan Dempsey OCLC OVGTSL 2005 Conference Newark, May 11-13
Page 16: OCoLR 20041025 #53928015 OCLCR Making data work harder Lorcan Dempsey OCLC OVGTSL 2005 Conference Newark, May 11-13
Page 17: OCoLR 20041025 #53928015 OCLCR Making data work harder Lorcan Dempsey OCLC OVGTSL 2005 Conference Newark, May 11-13

OCoLR 20041025 #53928015 OCLCR

Page 18: OCoLR 20041025 #53928015 OCLCR Making data work harder Lorcan Dempsey OCLC OVGTSL 2005 Conference Newark, May 11-13

Top Sets for Fiction (Records)

Record Keys1,296 defoe, daniel\1661 1731/robinson crusoe

1,267 carroll, lewis\1832 1898/alices adventures in wonderland

971 cervantes saavedra, miguel de\1547 1616/don quixote

828 stevenson, robert louis\1850 1894/treasure island

689 twain, mark\1835 1910/adventures of huckleberry finn

624 twain, mark\1835 1910/adventures of tom sawyer

618 swift, jonathan\1667 1745/gullivers travels

Page 19: OCoLR 20041025 #53928015 OCLCR Making data work harder Lorcan Dempsey OCLC OVGTSL 2005 Conference Newark, May 11-13

Top Sets for Fiction (Holdings)

Holding Keys

29,043 twain, mark\1835 1910/adventures of huckleberry finn

26,088 carroll, lewis\1832 1898/alices adventures in wonderland

20,843 twain, mark\1835 1910/adventures of tom sawyer

19,410 defoe, daniel\1661 1731/robinson crusoe

18,566 cervantes saavedra, miguel de\1547 1616/don quixote

18,492 stevenson, robert louis\1850 1894/treasure island

18,123 dickens, charles\1812 1870/christmas carol

Page 20: OCoLR 20041025 #53928015 OCLCR Making data work harder Lorcan Dempsey OCLC OVGTSL 2005 Conference Newark, May 11-13

OCoLR 20041025 #53928015 OCLCR

Taking FRBR onto the open web

Curio(u)ser

Page 21: OCoLR 20041025 #53928015 OCLCR Making data work harder Lorcan Dempsey OCLC OVGTSL 2005 Conference Newark, May 11-13
Page 22: OCoLR 20041025 #53928015 OCLCR Making data work harder Lorcan Dempsey OCLC OVGTSL 2005 Conference Newark, May 11-13
Page 23: OCoLR 20041025 #53928015 OCLCR Making data work harder Lorcan Dempsey OCLC OVGTSL 2005 Conference Newark, May 11-13
Page 24: OCoLR 20041025 #53928015 OCLCR Making data work harder Lorcan Dempsey OCLC OVGTSL 2005 Conference Newark, May 11-13

OCoLR 20041025 #53928015 OCLCR

MetaWiki

WIKI – web pages metaWIKI – data

Capture user input in structured ways

Page 25: OCoLR 20041025 #53928015 OCLCR Making data work harder Lorcan Dempsey OCLC OVGTSL 2005 Conference Newark, May 11-13

OCoLR 20041025 #53928015 OCLCR

Extending Wiki’s utility

Wiki: supported markup:

• wikitext page editing:

• a single text block

searches:• full text

searching collections

managed:• one per wiki

MetaWiki: supported markup:

• wikitext• structured data (e.g.,

MARC, METS, DC…) page editing:

• a single text block, or,• field level

searches:• full text searching• fielded searching

collections managed:• one/multiple per OaiWiki

Page 26: OCoLR 20041025 #53928015 OCLCR Making data work harder Lorcan Dempsey OCLC OVGTSL 2005 Conference Newark, May 11-13

Lorcan:note that this is a work in progress

Page 27: OCoLR 20041025 #53928015 OCLCR Making data work harder Lorcan Dempsey OCLC OVGTSL 2005 Conference Newark, May 11-13

OCoLR 20041025 #53928015 OCLCR

Management intelligence

So we have all this data – what can it tell us?

Several projects underway: only some discussed here

Page 28: OCoLR 20041025 #53928015 OCLCR Making data work harder Lorcan Dempsey OCLC OVGTSL 2005 Conference Newark, May 11-13

OCoLR 20041025 #53928015 OCLCR

Making Data Work Harder

Activities “shed” data:• Cataloging bibliographic information• Web site traffic transaction logs• Reference queries search term lists

Need to mine this data for intelligence that creates value for libraries and users

OCLC Research undertaking a number of data-mining projects aimed at:• Knowing more about the characteristics of library

collections• Creating interesting and useful data displays• Generating intelligence to support library decision-making

Page 29: OCoLR 20041025 #53928015 OCLCR Making data work harder Lorcan Dempsey OCLC OVGTSL 2005 Conference Newark, May 11-13

OCoLR 20041025 #53928015 OCLCR

Data mining

OCLC has a new collection analysis service

Some research projects looking at systemic questions described here.

Page 30: OCoLR 20041025 #53928015 OCLCR Making data work harder Lorcan Dempsey OCLC OVGTSL 2005 Conference Newark, May 11-13

OCoLR 20041025 #53928015 OCLCR

Looking at Library Print Book Collections … Systematically

32 million print books, representing26 million distinct works

Half of print books published after1977; more than 80% still “in copyright”

Rareness is common! Only a third of print books have more than five holdings; half have two or less

OCLC/Ithaka collaboration: Use WorldCat to characterize the “system-wide” print book collection – i.e., aggregate print book holdings in WorldCat

Intelligence of this kind can help establish digitization prioritiesand inform preservation planning

More information: http://www.oclc.org/research/presentations/lavoie/cni2005.ppt

Only about 120,000 works had bothprint book and e-book manifestations

Page 31: OCoLR 20041025 #53928015 OCLCR Making data work harder Lorcan Dempsey OCLC OVGTSL 2005 Conference Newark, May 11-13

OCoLR 20041025 #53928015 OCLCR

The Implications of GooglePrint …

Potentially covers about one third of print books in WorldCat

~60 percent of “GooglePrint” books held by only one of the Google 5

Less than 5 percent held by all of the Google 5

~20 percent of “GooglePrint books” out of copyright

Paper forthcoming …

Page 32: OCoLR 20041025 #53928015 OCLCR Making data work harder Lorcan Dempsey OCLC OVGTSL 2005 Conference Newark, May 11-13

OCoLR 20041025 #53928015 OCLCR

Know Your Audience!

Implies: we can infer materials’ audience level from holdings patterns, which in turn can support:• Collection management• Readers’ advisory services• Reference services• Information retrieval

Holdings represent selection decisions by librarians … implies there are about 1 billion individual selection decisions in the WorldCat holdings file

Selections are made to serve the interests of a library’s target community …• Associate target community (audience level) to particular library profiles - e.g., ARL, non-ARL academic, public, K-12 school …

Paper forthcoming!

?

Page 33: OCoLR 20041025 #53928015 OCLCR Making data work harder Lorcan Dempsey OCLC OVGTSL 2005 Conference Newark, May 11-13

OCoLR 20041025 #53928015 OCLCR

“Last Copy”: Identifying At-Risk Materials

~23 million WorldCat records have only a single holding attached

Libraries need to know what portions of their collections are:Rare … Rare and valuable …“Last copy” (artifact and/or content)

Identification of rare materials essential intelligence in support of storage, digitization, and preservation decision-making

Data-mining study of Vanderbilt holdings in WorldCat:• Identified 23,000 items held uniquely by Vanderbilt

• ~60 % are print books• ~60 % produced prior to 1950; ~25 % produced after 1970

Paper forthcoming!

Page 34: OCoLR 20041025 #53928015 OCLCR Making data work harder Lorcan Dempsey OCLC OVGTSL 2005 Conference Newark, May 11-13

OCoLR 20041025 #53928015 OCLCR

Thank you!

OCLC Research:

http://www.oclc.org/research/

Lorcan:

http://orweblog.oclc.org/