The Biodiversity Heritage Library: A Cornerstone of the Encyclopedia of Life

Embed Size (px)

DESCRIPTION

Presentation at the Biodiversity Heritage Library @ Smithsonian Libraries event during ALA (June 25, 2007) held at the National Museum of Natural History

Citation preview

  • 1. The Biodiversity Heritage Library Martin R. Kalfatovic Smithsonian Institution Libraries A Cornerstone of the Encyclopedia of Life

2. Biodiversity Heritage Library 3. Structure of theEncyclopedia of Life 4. OH O H 2 N OH H Serine Molecule 5. Education & Outreach Smithsonian/Harvard Informatics Marine Biological Laboratory Secretariat Smithsonian Synthesis Center Field Museum Biodiversity Heritage Library 6. Biodiversity Heritage Library

  • 2003, Telluride. Encyclopedia of Life meeting
  • February 2005. London. Library and Laboratory: the Marriage of Research, Data and Taxonomic Literature
  • May 2005.Washington. Ground work for the Biodiversity Heritage Library
  • June 2006. Washington. Organizational and Technical meeting
  • August 2006. New York Botanical Garden. BHL Directors Meeting.
  • October 2006. St. Louis/San Francisco. Technical meetings
  • February 2007. Museum of Comparative Zoology. Organizational meeting

7. Biodiversity Heritage Library

  • American Museum of Natural History (New York)
  • Field Museum (Chicago)
  • Natural History Museum (London)
  • Smithsonian Institution (Washington)
  • Missouri Botanical Garden (St. Louis)
  • New York Botanical Garden (New York)
  • Royal Botanic Garden, Kew
  • Botany Libraries, Harvard University
  • Ernst Meyer Library of the Museum of Comparative Zoology, Harvard University
  • Marine Biological Laboratory / Woods Hole Oceanographic Institution

James Dwight Dana Zoophytes. Atlas , 1849 8. Taxonomic Literature

  • Over 250 years of systematic description of life
  • The cited half-life of publications in taxonomy is longer than in any other scientific discipline
  • The decay rate is longer than in any scientific discipline
    • Tom Moritz

9. Literature Repatriation Biologia Centrali-Americana.Edited by Frederick Ducane Godman and Osbert Salvin.London : Pub. for the editors by R. H. Porter, 1879-1915 10. Digital Divide? 11. Digital Divide? Vishwas Chavan travels a lot. An informatician based at the National Chemical Laboratory in Pune, India, he collects data on what types of animal live where in India to enter into a biodiversity database Much of the information Chavan seeks is in old, out-of-print tomes To find them, Chavan has spent years trailing around libraries. He dreams of the day when books such as these are scanned and made available as digital files on the Internet. Science in the Web Age: The Real Death of Print by Andreas von Bubnoff Nature438, 550-552 1 December 2005 Henry Walter Bates The Naturalist on the River Amazons,1863 12. Narrowing the Divide 13.

  • Core literature pre-1923: 400,000 (80 million pages)
  • All pre-1923: 600-750,000 (120-150 million pages)
  • All literature: 1.4-1.6 million (280-320 million pages)

Biodiversity Heritage Library Mass. Zoological and Botanical Survey Reports on the fishes, reptiles and birds of Massachusetts , 1839 14. Changing Priorities

  • Open Access for scientific literature
  • Encourage re-use and re-purposing of the data in multiple and diverse systems
  • Work with non-commercial publishers to provide access

15. Changing Priorities

  • BHL has had discussions with various society publishers as well as:
    • BioOne
    • JSTOR

T.H. Huxley by Leslie Ward (Spy) 16. Digital Book Creation

  • Automated structure detection vital for serials
  • Taxonomic Intelligence
  • Digital Identifiers
  • Scalable mass scanning (outside of the Google environment)

Richard Owen by Leslie Ward (Spy) 17. BHL Structural Metadata First Ingest Internet Archive 390888347 45632 390888346 45632 390888345 45632 390888344 45632 390888343 45632 Sub-element Barcode Bib # 18. BHL Structural Metadata Sub-Element Map Internet Archive 5 390888343 45632 4 390888343 45632 3 390888343 45632 2 390888343 45632 1 390888343 45632 Sub-element Barcode Bib # 19. BHL Structural Metadata Page Structure Map Internet Archive XML structure map that delineates the relationships of the images created automatically 1 390888343 45632 Sub-element Barcode Bib # 0005 0004 0003 0002 0001 Image Number 20. Taxonomic Intelligence 21. Taxonomic Intelligence

  • 9.4 million name strings in NameBank
  • Uses sophisticated algorithm (TaxonGrab) to locate likely name strings in OCR text
  • Iterative processing of BHL texts will both increase the number of name strings in NameBank and increase the accuracy of name string recognition

Georges Louis Leclerc, comte de Buffon Histoire naturelle : gnrale et particulire (Oiseaux) , 1799-1808 22. Digital Identifiers

  • Digital Object Identifiers (DOI)
  • Handles
  • Life Science Identifiers (LSID)
  • URIs
  • Etc.

Telespiza palmeri Avifauna of Laysan , 1893-1900 23. Digital Identifiers

  • Factors:
    • Cost per identifier
    • Community acceptance
    • Scalability
  • BHL is working with TDWG and others to come up with the best scheme(s)

Moho bishopi Aviafauna of Laysan , 1893-1900 24. Scalable Mass Scanning 25. The Internet Archive

  • 501(c)(3) organization
  • Dedicated to Universal Access to Human Knowledge
  • Founder of the Open Content Alliance
  • Provides:
    • Mass scanning
    • Archival storage of files
    • Image processing
    • Technology development

26. Internet Archive Scribe Scanner

  • Single Scribe Machine
    • Human operated
    • 200 volumes per shift per week
    • ~ 70,000 pages from a single machine per week
    • Cost: $100,000 / year

27. Internet Archive Scribe: Boston

  • Cooperative facility with the Boston Library Consortium (19 New England Libraries)
  • BHL Members MBL/WHOI and Harvard Libraries will use the facility
  • Status: In production

28. Internet Archive Scribe: Boston 29. Internet Archive Scribe: London

  • Single Scribe in place
  • Projected 5 unit pod to be located at The Natural History Museum
  • Status: In production

30. Internet Archive Scribe: London 31. Internet Archive Scribe: Washington

  • Single unit arrived May 5
  • Funded by Smithsonian Libraries
  • Projected 5 unit BHL pod in National Museum of Natural History
  • Projected 10-15 unit pod shared by Smithsonian/BHL and regional Washington libraries

32. Internet Archive Scribe: Washington 33. Internet Archive Scribe: New York

  • Current BHL plans focus on sharing a 10 unit pod located at the New York Public Library
  • American Museum of Natural History and New York Botanical Garden will use this facility
  • Status: in planning

Carl von Linn (1707 - 1778) 34. Internet Archive Scribe: Illinois

  • Two machines funded by State of Illinois
  • UIUC scanning Fieldiana (all series)
  • Arrangement coordinated by Michael Godow, Bryan Heidorn (UIUC/GSLIS), Betsy Kruger (UIUC/Library)
  • Status: In production

35. Internet Archive Scribe: Illinois 36. BHL Portal

  • Library catalog-like interface to BHL literature
  • Enhanced structural analysis to provide volume/issue/article page access to the literature
  • Iterative development based on feedback from user community
  • Provide access to two key audiences:
    • Humans
    • Machines

37. www.biodiversitylibrary.org 38. 39. 40. 41. 42. 43. 44. 45. 46. BHL Literature Online 1,291,485 pages 657,310 pages via BHL Portal Yet another physical difficulty is the task of assembling the library and indexes which will enable the student to work under proper conditions. the beginner must now be prepared to spend liberally, or else mustestablish himself in an institution where a large library exists ; if he work by himself with only a few books, he will have to confine himself to a very narrow specialty indeed. 'The Limitations of Taxonomy' by J.M. Aldrich,Science , April 22, 1927, vol. LXV, no. 1686, p.381 47. Biodiversity Heritage Library 48. Biodiversity Heritage Library