Transcript

Taking Ownership of Electronic Journals & Books:A Tale of Two Repositories

CNI Spring 2012 Meeting

Alan DarnellDirector, Scholars Portal

420,00 FTE

http://www.ocul.on.ca

http://www.scholarsportal.info

Background• since 2002, 21 Ontario University libraries have

collaborated to acquire and manage shared collections of electronic journals and electronic books

• licensing happens nationally and provincially through Canadian Research Knowledge Network and OCUL

• repositories are managed by Scholars Portal, a unit of the University of Toronto Library system

Scope of Content• Journals is a repository of over 25.9M full text

documents from 11,400 journals supplemented by article metadata from JSTOR and Project MUSE, bringing total citations to 31.5M and 12,740 journals

• Books is a repository of over 460,000 titles, over 100,000 current and close to 360,000 digitized from various Canadian collections participating in the OCA

Content by Publisher

Journals Books

Costs

• about $29.6M of journal content added in 2011 and cumulatively well over $150 M since 2002

• Scholars Portal operation costs are $2.9M annually, with 1/3 of these resources devoted to managing Journals and Books

Goals

• Aggregate content for enhanced discovery

• Create framework to support long-term preservation of licensed content

• Reduce cost through collaborative purchasing and shared infrastructure

Journals Features

Details Page

Related Books

View ISO16363 PreservationMetadata

Books Features

Personal Accounts for Annotations and Bookmarks

Digitized Books with Enhanced Metadata

Isn’t it all just digital content?

• Services have broad similarities– License content– Secure local loading and preservation rights– Transfer content from publisher– Develop metadata crosswalk and data loader– Load content and perform Q&A– Set up entitlements– Distribute metadata to allow for discovery– Gather statistics

It’s all in the details

• the combination of small differences throughout that workflow results in significantly more effort required to manage ebooks

• poorer results when measured in terms of enhanced discovery, long-term preservation and cost savings

• highlight some of those differences by looking at a few key elements of both services

Purchasing content

• Big deals still prevalent• Wide buy in from

libraries• Deal directly with

publishers• Annual renewals

• Some big package purchases but more one-off purchases

• Wide variations in adoption among libraries

• Strong role for aggregators and agents

Licensing and DRMs• Very standard license

models (OCUL model license)

• Wide use of “perpetual access” clauses

• Transformation rights generally accepted

• No DRMs; unlimited use; ILL rights

• Licenses are wildly different from publisher to publisher

• Few specific options for “perpetual access”

• Transformation rights unclear (e.g. images)

• Common requirements for DRMs (downloading, printing, copy and paste, and concurrent use, watermarks)

Publisher Support Infrastructure• Established processes to

feed journal content to various channels (A&I, discovery systems)

• High volume, fast turnaround

• Metadata packaged with content

• Direct from publisher• Standard formats (e.g.

NLM DTD)

• Uneven quality in supporting distribution channels

• Slow turnaround• Gap in metadata

workflow from publishers to libraries

• Intermediaries are common

• Internal practices and coporate standards

Entitlements Management

• Generally straightforward; can be managed at title and year level (12,000 titles)

• Some complications with changes in title ownership and appearance of articles in more than one publisher/provider

• Entitlements must be handled at title level (100,000s)

• Cherry-picking from collections is common

• Tracking DRM and DRM rolling walls

• Entitlement is not simply “on” or “off”

Quality Control

• Ensure completeness at volume and issue level

• Gaps at the article level identified by end-users

• Easy resolution with publishers through reference to dataset as shipped

• Completeness has to be at the title and chapter level

• Matching to MARC records via ISBN is problematic

• ISTC not in wide use• Match to cover images• Unreliability of title lists

Preservation Issues• Clear license language on

perpetual access and transformation rights

• Organizational commitment to preservation of digital copies

• Fairly uniform data formats

• Publishers have legal authority to grant transformation rights

• DRM restrictions are antithetical to preservation (watermarks, concurrent use)

• Ebook content does not always replicate print book content (e.g. image rights)

• Print-based preservation strategies prevail, but e-only books are the near future

Metadata Standards• Journal metadata is XML

based and increasingly converging on NLM DTD

• Metadata and data packaged in ways that make linking easy; common source

• NLM is common format for both metadata only and full-text

• DOI assignment is reliable identifyer among publishers

• No dominant XML based metadata format for ebooks (Onix is not uniformly used by scholarly publishers)

• No dominant XML format for ebook full-text (ePub is still a format for trade publishing)

• DOI assignment is hit and miss (book and chapter level)

• MARC is a foreign standard for publishers

Accessibility Issues• Provincial standard is

based on WACG Level 2• Most PDFs, though not

tagged, are readable with screen reading software

• Full downloads also allow for ingest into Kurweil and other adaptive technologies

• Online page readers with no embedded text as invisible

• Chapter downloads are more effective but allowed rarely

• Full book downloads require controlled access

• Older digitized materials can be difficult to read with adaptive technologies

Use• ~50,000 daily visits• Close to 1 M article

downloads monthly in peak periods

• Split of ~ 50/50 between publisher and SP

• Visitor flow: vast majority of traffic comes from OpenURL resolvers

• All content represented in OpenURL KBs

• ~1800 daily visits• Books accessed in

monthy period?• Much lower ratio of SP

use compared to publisher

• Visitor flow: vast majority of traffic comes from library catalogues

• Only 4-5 libraries have loaded MARC records for SP content

Use Drivers - Journals

Use Drivers - Books

Use Drivers• OpenURL resolvers• A-Z lists

– Importance of being present in big KBs

• Issue of “dual access”• Google indexing has a

small role for OCUL users (more for external users)

• Not present directly in discovery layers

• Library Catalogues– Quality of MARC records is

an issue for many– Publishers don’t provide

high quality MARC records– Sourcing records and then

linking is an issue• Google indexing of

metadata• OpenURL resolvers

– Working to get presence• Discovery Layer indexing

of public domain content

Hope for EBooks?

• Secure agreements with publishers to load all content, and not just currently subscribed content

• Establish presence in major commercial KBs• Deal with rights issues related to indexing in discovery

systems and bypass dependence on MARC• Resist DRM encumbered content – look for other

models to deal with lost income due to course adoption

• Insist that publishers support ePub 3 for accessible content

Questions?

http://journals.scholarsportal.infohttp://books.scholarsportal.info