39
Ebooks: digitizing our print collections Sian Meikle University of Toronto Libraries

Ebooks: digitizing our print collections Sian Meikle University of Toronto Libraries

Embed Size (px)

Citation preview

Page 1: Ebooks: digitizing our print collections Sian Meikle University of Toronto Libraries

Ebooks: digitizing our print collections

Sian MeikleUniversity of Toronto Libraries

Page 2: Ebooks: digitizing our print collections Sian Meikle University of Toronto Libraries

Digitizing our print collections About mass digitization

Who is digitizing? What is getting digitized? What is the output? Digitization issues

Integration of print-to-digital content Choices for discovery and access Issues for delivery

Page 3: Ebooks: digitizing our print collections Sian Meikle University of Toronto Libraries

Partners: 11 libraries; several publishers University of California National Library of Catalonia University Complutense of Madrid Harvard University University of Michigan New York Public Library Oxford University Stanford University University of Texas at Austin University of Virginia University of Wisconsin at Madison

Page 4: Ebooks: digitizing our print collections Sian Meikle University of Toronto Libraries

Scanning: in-copyright & out-of-copyright In-copyright:

Searching : all content fully indexed Display: snippets can be viewed, full text can

be bought or located in library Out-of-copyright:

PDFs can be downloaded for personal use, bought, or located in library

Page 5: Ebooks: digitizing our print collections Sian Meikle University of Toronto Libraries

Google output Metadata Scanned images: TIFFs Access derivatives:

JPEGs Image-based PDFs (one per page or one per

book) Uncorrected OCR

Page 6: Ebooks: digitizing our print collections Sian Meikle University of Toronto Libraries

Partners: Internet Archive 29 libraries Publishers: O’Reilly Media Industrial partners:

MSN HP Labs Adobe Xerox

Page 7: Ebooks: digitizing our print collections Sian Meikle University of Toronto Libraries

Open Content Alliance library partners Boston Public Library Boston Library Consortium Columbia University Emory University European Archive Indiana University Johns Hopkins University

Libraries McMaster University Memorial University of

Newfoundland Missouri Botanical Garden MSN National Archives (United

Kingdom) National Library of Australia Rice University Tufts University

San Francisco Public Library Simon Fraser University Smithsonian Libraries University of Alberta University of British Columbia University of California University of Chicago Library University of Georgia University of Illinois Urbana-

Champaign University of North Carolina University of Ottawa University of Pittsburgh University of Texas University of Toronto University of Virginia Washington University York University

Page 8: Ebooks: digitizing our print collections Sian Meikle University of Toronto Libraries

Scanning: Out of copyright, copyright-cleared material

Searching: Search full content via MSN Live Book Search Metadata via Internet Archive

Use: All scans & derivatives can be downloaded

Page 9: Ebooks: digitizing our print collections Sian Meikle University of Toronto Libraries

Open Content Alliance output Metadata Scanned images: TIFFs/J2Ks/CR2s Access derivatives:

JPEGs DJVU (viewer, requires plugin) Flip book (viewer, does not require plugin) Image-based PDFs (one per book) Uncorrected OCR integrated into the PDFs

Page 10: Ebooks: digitizing our print collections Sian Meikle University of Toronto Libraries

Mass digitization content 900,000 pre-1923 titles 60% are unique 40% have more than one manifestation

Published page count

0

2

4

6

8

10

12

pre 1923 1923-1963 post 1963

Bil

lio

ns

of

pag

es

Data courtesy Microsoft, 2007

Page 11: Ebooks: digitizing our print collections Sian Meikle University of Toronto Libraries

Comparison: scanned & born-digitalScanned from print Born-digital

Page images E-text

Search uncorrected OCR Search text

TOC, title page, index are marked Can be highly segmented, linked

Literature, history, … STM, social sciences, reference, …

Page 12: Ebooks: digitizing our print collections Sian Meikle University of Toronto Libraries

Mass digitization: some Q&ADuplication

Q: How do we guard against duplication?A: It might be cheaper just to scan duplicates.

OmissionsQ: What about fold outs, uncut pages, tightly

bound books, print running into margins…A: Mass digitization works because it is efficient.

A parallel process should handle exception cases.

Page 13: Ebooks: digitizing our print collections Sian Meikle University of Toronto Libraries

Mass digitization at U of Toronto

Not scanned:2,400 (8%)

Scanned:32,000 (92%)

Page 14: Ebooks: digitizing our print collections Sian Meikle University of Toronto Libraries

Method 1: Union digital repositoryInternet Archive (OCA) E-books integrated with non-book content User contributions (content, reviews) Other sites can point to this content

Page 15: Ebooks: digitizing our print collections Sian Meikle University of Toronto Libraries
Page 16: Ebooks: digitizing our print collections Sian Meikle University of Toronto Libraries
Page 17: Ebooks: digitizing our print collections Sian Meikle University of Toronto Libraries
Page 18: Ebooks: digitizing our print collections Sian Meikle University of Toronto Libraries

Method 2: Full text search repository MSN Live Books and Google Books Both cross-book & intra-book searching Google’s goal is to index MSN is developing a reading environment

Page 19: Ebooks: digitizing our print collections Sian Meikle University of Toronto Libraries

Google

Page 20: Ebooks: digitizing our print collections Sian Meikle University of Toronto Libraries

Google

Page 21: Ebooks: digitizing our print collections Sian Meikle University of Toronto Libraries

Google

Page 22: Ebooks: digitizing our print collections Sian Meikle University of Toronto Libraries
Page 23: Ebooks: digitizing our print collections Sian Meikle University of Toronto Libraries

MSN Live Book Search

Page 24: Ebooks: digitizing our print collections Sian Meikle University of Toronto Libraries

MSN Live Book Search

Page 25: Ebooks: digitizing our print collections Sian Meikle University of Toronto Libraries

MSN Live Book Search

Page 26: Ebooks: digitizing our print collections Sian Meikle University of Toronto Libraries

Why load it locally? Safekeeping

Lots of copies keep stuff safe! Discovery

Integration with licensed books Integration with non-book content Local subject specialization

Page 27: Ebooks: digitizing our print collections Sian Meikle University of Toronto Libraries

Method 3: Local loadUniversity of Michigan E-books linked from OPAC Rights system decides who can view:

Nobody University of Michigan United States World

In-book searching: OCR, one-at-a-time

Page 28: Ebooks: digitizing our print collections Sian Meikle University of Toronto Libraries

MBooks at University of Michigan Download and validation:

local data mover GROOVE (perl & mysql) Data integrity:

MDS fixity checks on jpegs, tiffs, utf-8 Quality assurance:

GROOVE samples 20 p. chunks for students to check with ACDSee

Problems referred to Google for later correction

Page 29: Ebooks: digitizing our print collections Sian Meikle University of Toronto Libraries

University of Michigan: OPAC link

Page 30: Ebooks: digitizing our print collections Sian Meikle University of Toronto Libraries

University of Michigan: OPAC link

Page 31: Ebooks: digitizing our print collections Sian Meikle University of Toronto Libraries

University of Michigan: e-book display

Page 32: Ebooks: digitizing our print collections Sian Meikle University of Toronto Libraries

University of Michigan: search in e-book

Page 33: Ebooks: digitizing our print collections Sian Meikle University of Toronto Libraries

How do people read?

Intentional reading Attentive, sustained,

linear reading of text Heavily influenced by

printed-book culture Dominant in classical

and scholarly literature

Functional reading Manipulating different

content types Web browsing, text

database searching Most screen reading is

functional

Intentional Functional

Hillesund, T., & Noring, J. E. (2006)

Page 34: Ebooks: digitizing our print collections Sian Meikle University of Toronto Libraries

How do people know what they’ve read?[A] strong relationship…exists between the

sensory motor representation of the user and his/her treatment of the information content of the paper book or e-book…

Because an electronic book is functionally closer to a computer than a traditional book […] it does not provide the external indicators to memory that the classical book does…

Morineau et al, 2005

Page 35: Ebooks: digitizing our print collections Sian Meikle University of Toronto Libraries

Delivering the book to the user

Printed books

Make use copy

Make discovery surrogate

Search surrogates,choose candidates

Examine candidates

Browse more candidates

Choose material

Online books

Make discovery surrogate

Search surrogates, choose candidates

Examine candidates

???

Choose material

Make use copy

Use

r ta

sks

Page 36: Ebooks: digitizing our print collections Sian Meikle University of Toronto Libraries

Implications for mass digitization

Support production of good print copies for use

Target TOC and index for indexing & correction

Provide granular linking Provide browse functions

Page 37: Ebooks: digitizing our print collections Sian Meikle University of Toronto Libraries
Page 38: Ebooks: digitizing our print collections Sian Meikle University of Toronto Libraries

References Blanche, C., Gueguen, N., Morineau, T., & Tobin, L. (2005). The

emergence of the contextual role of the e-book in cognitive processes through an ecological and functional analysis. International Journal of Human-Computer Studies, 62(3), 329-348.

Christianson, M., & Aucoin, M. (2005). Electronic or print books: Which are used? Library Collections, Acquisitions, and Technical Services, 29(1), 71-81.

Hillesund, T., & Noring, J. E. (2006). Digital libraries and the need for a universal digital publication format. JEP: the Journal of Electronic Publishing, vol.9, no.2,

cLevine-Clark, M. (2006). Electronic book usage: A survey at the University of Denver. portal: Libraries and the Academy, 6(3), 285-299.

Su, S. (2005). Desirable search features of web-based scholarly e-book systems. Electronic Library, 23(1), 64-71.

Page 39: Ebooks: digitizing our print collections Sian Meikle University of Toronto Libraries

Mass digitization archives Google Books:

http://books.google.com/ Internet Archive:

http://archive.org/ MSN Live Book Search:

http://books.live.com/ University of Michigan:

http://mirlyn.lib.umich.edu/