21
RLG Programs Measuring Uniqueness in System-wide Book Holdings: Implications for Collection Management Constance Malpas Program Officer RLG Programs

RLG Programs Measuring Uniqueness in System-wide Book Holdings: Implications for Collection Management Constance Malpas Program Officer RLG Programs

Embed Size (px)

Citation preview

Page 1: RLG Programs Measuring Uniqueness in System-wide Book Holdings: Implications for Collection Management Constance Malpas Program Officer RLG Programs

RLG Programs

Measuring Uniqueness in System-wide Book Holdings: Implications for Collection Management

Constance MalpasProgram OfficerRLG Programs

Page 2: RLG Programs Measuring Uniqueness in System-wide Book Holdings: Implications for Collection Management Constance Malpas Program Officer RLG Programs

RLG Programs Managing Last Copies

CCDO Meeting, ALA Midwinter – 12 January 2008

2

This presentation

Summarizes recent data-mining efforts by OCLC Programs and Research System-wide sample (Summer 2007 – Spring 2008) ARL unique print books (Autumn 2007)

Suggests implications for collection managers Outlines next steps for RLG Programs An opportunity to discuss what additional

evidence and analysis is needed

Page 3: RLG Programs Measuring Uniqueness in System-wide Book Holdings: Implications for Collection Management Constance Malpas Program Officer RLG Programs

RLG Programs Managing Last Copies

CCDO Meeting, ALA Midwinter – 12 January 2008

3

What we mean by ‘last copy’

Monographic title uniquely-held by a single WorldCat contributor Cf. ‘single copy’ repositories, where ‘last copy’ is relative

to local/group holdings May represent a last manifestation, expression or

work Bibliographic records describe manifestations, not

copies; unique manifestations are the point of departure for analysis

Some are intrinsically unique; others are rendered unique by erosion of system-wide holdings Historical data may help document increased copy or

work-level availability, but weren’t included in the studies presented here

Page 4: RLG Programs Measuring Uniqueness in System-wide Book Holdings: Implications for Collection Management Constance Malpas Program Officer RLG Programs

RLG Programs Managing Last Copies

CCDO Meeting, ALA Midwinter – 12 January 2008

4

Distribution of uniquely-held print booksin ARL member institutions

0

100,000

200,000

300,000

400,000

500,000

600,000

700,000

LC Yale

Alberta

Colum

bia

U Chic

ago

UCLA

McG

ill

Penn

Uva

Hawaii

U Md

San D

iego

SUNY Buf

falo

Rutge

rs

Dartm

outh

Notre

Dam

e

Orego

n

GA Tec

h

Delawar

e

Flor

ida S

tate

So Illi

nois

Alabam

aIrv

ine

GWU

Way

ne S

tate

York

Virgin

ia Tec

h

WA S

tate

Case

Wes

tern

Man

itoba

Howar

d

ARL member institution

Un

iqu

e ti

tles

Distribution of wealth: ARL unique books

A classic Pareto distribution

20% of the population holds >75% of unique titles

Median institutional holdings = 19K titles

institutional excellence?

(or) a “network effect?”

N = 6.95 M titles

Page 5: RLG Programs Measuring Uniqueness in System-wide Book Holdings: Implications for Collection Management Constance Malpas Program Officer RLG Programs

RLG Programs Managing Last Copies

CCDO Meeting, ALA Midwinter – 12 January 2008

5

Why focus on uniquely-held titles?

“Scarcity is common” limited redundancy in holdings = limited preservation

guarantee, limited opportunity to create economies of scale by aggregating supply

Research institutions bear the brunt of responsibility for long-term preservation and access of unique titles Academic and independent research libraries hold up to 70%

of aggregate unique print book collection Continuing costs of managing (storing, providing access to)

print collections are high; use is generally declining Space pressure on physical plant (on-campus, remote) is high;

understanding distribution and characteristics of unique holdings can inform decisions about disposition of physical collection

Increased attention to stewardship of special collections ARL SCWG, CLIR, LC Task Force on Bibliographic Control – new

attention to what constitutes ‘special’ collections, appropriate standards of care, modes and metrics of use

Page 6: RLG Programs Measuring Uniqueness in System-wide Book Holdings: Implications for Collection Management Constance Malpas Program Officer RLG Programs

RLG Programs Managing Last Copies

CCDO Meeting, ALA Midwinter – 12 January 2008

6

Challenges

Identification requires group / network view of holdings WorldCat provides a reasonably proxy for system-wide collection

Some materials (MSS, theses and dissertations, etc.) are intrinsically unique; not all can be algorithmically identified in MARC records

hybrid approach combines computational and manual analysis of bibliographic data

Sparse bibliographic records impede efficient work/title matching, may introduce spurious measure of uniqueness

external sources (including Google) sometimes helpful in filling gaps

Non-English titles (especially transliterated non-roman scripts) are especially difficult to match

we resisted the temptation to exclude these

Page 7: RLG Programs Measuring Uniqueness in System-wide Book Holdings: Implications for Collection Management Constance Malpas Program Officer RLG Programs

RLG Programs Managing Last Copies

CCDO Meeting, ALA Midwinter – 12 January 2008

7

Study I: System-wide Sampling

250 randomly selected, uniquely-held titles Limited to printed books (including theses) published

before 2005 English-language cataloging only Iterative re-sampling required to fill gaps

Independently reviewed by three project staff Level of uniqueness Material type

Results periodically collated for group analysis Compare results of individual analysis for consistency Seek consensus on difficult cases – relatively few of

these Re-sample as necessary to fill gaps

White paper anticipated March 2008

Page 8: RLG Programs Measuring Uniqueness in System-wide Book Holdings: Implications for Collection Management Constance Malpas Program Officer RLG Programs

RLG Programs Managing Last Copies

CCDO Meeting, ALA Midwinter – 12 January 2008

8

Study II: ARL uniquely-held books

Ad hoc analysis by RLG Programs, prompted by IMLS Connecting to Collections grant announcement How might the existing evidence base be used to focus

regional preservation investments? Based on January 2007 snapshot of WorldCat database:

13M records for titles (6.95M print books) uniquely held by ARL institutions; 300+ OCLC symbols; 123 institutions Iterative analysis examined relative impact of

theses/dissertations and recent imprints on system-wide uniqueness; regional and institutional distribution of holdings

Findings shared with ARL Special Collections Working Group (October 2007) and selected RLG partner institutions (UC; CIC; ReCAP; Harvard; ASU; NYU) Heritage Preservation willing to share Heritage Health survey

data for cross-tabulation on as-needed basis

Page 9: RLG Programs Measuring Uniqueness in System-wide Book Holdings: Implications for Collection Management Constance Malpas Program Officer RLG Programs

RLG Programs Managing Last Copies

CCDO Meeting, ALA Midwinter – 12 January 2008

9

Limitations

Current studies limited to printed books – excludes serials, special collections; only a partial measure of uniqueness in system-wide collection

Incomplete representation of world book collection; for non-English titles especially, uniqueness of North American holdings is only relative

Cataloging backlogs of up to 5 years mean that holdings for recent acquisitions are imperfectly reflected

Incomplete coverage of rare books and special collections prior to (ongoing) integration of RLG Union Catalog

Page 10: RLG Programs Measuring Uniqueness in System-wide Book Holdings: Implications for Collection Management Constance Malpas Program Officer RLG Programs

RLG Programs Managing Last Copies

CCDO Meeting, ALA Midwinter – 12 January 2008

10

Our findings – distribution of unique titles

Research and academic libraries hold >70% of aggregate unique print book collection while value and utility of these holdings may be widely

distributed across the library community, holdings are concentrated at institutions with a research / teaching / learning mandate

limited data on aggregate use, sources of demand Institutional distribution of unique holdings is

highly skewed, with a handful of libraries holding a majority share of collective assets ARL unique print book holdings range from 400 – 600K

titles per institution; median holdings = 19K titles generally, institutions with large collections hold more

unique materials – but absolute size of collection is not an indicator of relative uniqueness

Page 11: RLG Programs Measuring Uniqueness in System-wide Book Holdings: Implications for Collection Management Constance Malpas Program Officer RLG Programs

RLG Programs Managing Last Copies

CCDO Meeting, ALA Midwinter – 12 January 2008

11

Based on a randomly selected sample of 250 uniquely-held print book titles in WorldCat (Jan. 2007)

Unique titles by library type

50%

27%

6%

6%

4%4% 2% 1%

ARL

Academic (non-ARL)

Gov't

State and National

Special

Public

Unknown

Networks

Page 12: RLG Programs Measuring Uniqueness in System-wide Book Holdings: Implications for Collection Management Constance Malpas Program Officer RLG Programs

RLG Programs Managing Last Copies

CCDO Meeting, ALA Midwinter – 12 January 2008

12

National libraries and institutions with deep collections and an aggressive approach to collecting and cataloging new monographs – LC, Harvard, Libraries & Archives Canada – have an exceptional range of unique holdings

Unique Print Books in ARL Institutions

CRL’s focus on theses and dissertations is evident – most uniqueness is attributable to these holdings

Institutions with younger collections, actively seeking to increase scope of coverage - NCSU, Temple – are building uniqueness in new titles

Page 13: RLG Programs Measuring Uniqueness in System-wide Book Holdings: Implications for Collection Management Constance Malpas Program Officer RLG Programs

RLG Programs Managing Last Copies

CCDO Meeting, ALA Midwinter – 12 January 2008

13

Content-type Distributions: CRL and ARL

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Center forResearch Libraries

ARL aggregatecollection

Unique theses

Unique print books pub'd2000 and after

Unique print books pub'dbefore 2000

Intrinsically unique content, “only copies”

May include “first copies” in cataloging queue; uniqueness subject to rapid erosion

Page 14: RLG Programs Measuring Uniqueness in System-wide Book Holdings: Implications for Collection Management Constance Malpas Program Officer RLG Programs

RLG Programs Managing Last Copies

CCDO Meeting, ALA Midwinter – 12 January 2008

14

Our findings – levels of uniqueness

~60% of titles represent unique works Ex: Report and recommendation … on a proposed loan … equivalent

to US$70 million to the … Islamic Republic of Pakistan for a power plant efficiency improvement project (1987) – World Bank report held by George Washington University

~15% of titles represent unique manifestations Ex. Gallipolis … an account of the French five hundred and of the town

they established … compiled by Workers of the Writers' program of the Work projects administration (1940) – microform pamphlet held by Yale University; related manifestations at 40 libraries

~5% of titles represent unique expressions Ex: E.J. Luck. A pedigree of the families Luck, Lock and Lee (1908) –

book held by Masssanutten Regional Library, VA; similar title (Luck, Lock) by same author, pub’d in 1900, held at LC

~20% of titles not unambiguously unique: duplicate or near-duplicate records can be found in WorldCat Ex: K. Kimura. Edo no akebono (1956) – book held by Harvard

Yenching; apparent duplicate (cataloged with original scripts) held by Waseda, Yale

Page 15: RLG Programs Measuring Uniqueness in System-wide Book Holdings: Implications for Collection Management Constance Malpas Program Officer RLG Programs

RLG Programs Managing Last Copies

CCDO Meeting, ALA Midwinter – 12 January 2008

15

Our findings – content characterization

Material types ~35% are books (>50pp)

most appear to be non-fiction titles, less likely to have additional manifestations

~20% theses and dissertations many at Master’s level – unlikely to be held beyond issuing

institution ~15% government documents

mostly federal and state, may be duplicated in depositories ~10% pamphlets

unique content, but rarely useful in isolation ~10% analytics; single articles or issues bound as a

separate volume non-unique content

<5% early imprints lost treasures?

Small numbers of by-laws, scripts, legal briefs, minutes, etc.

Page 16: RLG Programs Measuring Uniqueness in System-wide Book Holdings: Implications for Collection Management Constance Malpas Program Officer RLG Programs

RLG Programs Managing Last Copies

CCDO Meeting, ALA Midwinter – 12 January 2008

16

Implications

Institutions with significant unique holdings may benefit from ‘splitting the difference’ between unique works and manifestations

unique manifestations and analytics should be judged with an eye to provenance history; unless they contribute to local distinctiveness, immediate action may not be warranted

A preliminary sort by material type may help guide local decision-making regarding the physical disposition of unique holdings

pamphlets and technical reports may be candidates for cataloging enhancement and storage transfer; books may be short-listed for digitization and/or transfer to special collections

Institutions with smaller unique print book collections may benefit from collective action to aggregate supply (through effective disclosure) and demand (through special resource-sharing and digitization initiatives) around specific topical and disciplinary interests

local collections gain in significance when presented in context with related holdings

Page 17: RLG Programs Measuring Uniqueness in System-wide Book Holdings: Implications for Collection Management Constance Malpas Program Officer RLG Programs

RLG Programs Managing Last Copies

CCDO Meeting, ALA Midwinter – 12 January 2008

17

Recommendations

Adopt a nuanced understanding of ‘relative uniqueness’ when assessing local holdings

Unique manifestations may not represent unique intellectual content, but may have other value As artifacts special collections As a networked resource increased availability

Unique works may gain relevance and value when presented as part of a larger disciplinary or topical collection Theses and dissertations may benefit from special discovery

tools, integration in local scholarly communications initiatives Pamphlets and technical reports may be virtually aggregated

for specific communities of use Maximize disclosure of unique holdings to increase their

impact and value Focus on use and utility of unique holdings to ensure

long-term preservation, enduring value to parent institution

Page 18: RLG Programs Measuring Uniqueness in System-wide Book Holdings: Implications for Collection Management Constance Malpas Program Officer RLG Programs

RLG Programs Managing Last Copies

CCDO Meeting, ALA Midwinter – 12 January 2008

18

What’s Next . . .

Holdings validation study will examine a sample of scarcely-held (<5 copies) US imprints in North-American research libraries Compare current WorldCat holdings to historical

holdings – looking for signs of collection erosion; elimination of local backlogs (diminishing uniqueness)

Compare local holdings to current WorldCat holdings – location changes/storage transfers, withdrawals

Assess impact of local preservation actions on system-wide holdings (availability, condition) and potential value of ‘full disclosure’

Collaborative effort with RLG partner institutions anticipated Spring/Summer 2008

Page 19: RLG Programs Measuring Uniqueness in System-wide Book Holdings: Implications for Collection Management Constance Malpas Program Officer RLG Programs

RLG Programs Managing Last Copies

CCDO Meeting, ALA Midwinter – 12 January 2008

19

Some closing observations

Opportunities Large research libraries hold a wealth of unique materials –

long tail resources with broad potential audience Aggregated bibliographic data supports programmatic

analysis and enrichment – work-level clustering, identification of duplicates

Largest institutions, with enduring commitments to retention and access, hold majority of potential ‘at risk’ titles

Challenges Libraries ill-equipped to measure potential demand for

unique holdings Technical and social infrastructure for aggregating supply is

lacking University presses are potential distribution partners, but

alliances are weak

Page 20: RLG Programs Measuring Uniqueness in System-wide Book Holdings: Implications for Collection Management Constance Malpas Program Officer RLG Programs

RLG Programs Managing Last Copies

CCDO Meeting, ALA Midwinter – 12 January 2008

20

Questions, Comments?

‘Managing the Collective Collection’ work agenda Data-mining for management intelligence Shared print collections

http://www.oclc.org/programs/ourwork/collectivecoll

Midwinter RLG Update Session1:30-3:30 Marriott 302-304

Contact: Constance MalpasProgram [email protected]

Page 21: RLG Programs Measuring Uniqueness in System-wide Book Holdings: Implications for Collection Management Constance Malpas Program Officer RLG Programs

RLG Programs Managing Last Copies

CCDO Meeting, ALA Midwinter – 12 January 2008

21

N=5.9M titles

Median institutional holdings =96k unique titles