Upload
martin-kalfatovic
View
1.766
Download
1
Tags:
Embed Size (px)
DESCRIPTION
An International Cooperative Digital Library for Taxonomic Literature: The Biodiversity Heritage Library. Martin R. Kalfatovic. American Library Association Annual Meeting. Collaborative Digital Initiatives: Show and Tell and Lessons Learned. June 30, 2008. Anaheim, CA.
Citation preview
MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: AMERICAN LIBRARY ASSOCIATION :: 30 JUNE 2008
Martin R. KalfatovicSmithsonian Institution Libraries30 June 2008
An International Cooperative Digital
Library for Taxonomic Literature
MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: AMERICAN LIBRARY ASSOCIATION :: 30 JUNE 2008
MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: AMERICAN LIBRARY ASSOCIATION :: 30 JUNE 2008
Biodiversity
What is Biodiversity?
Genetic variability within species
Diversity of species Ecosystems and
landscapes
MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: AMERICAN LIBRARY ASSOCIATION :: 30 JUNE 2008
Biodiversity
Wholesome food Drinkable water Breathable air Stable climate for
Forestry Agriculture Fisheries
Waste decomposition Bioremediation Invasive species Pest control Ecotourism
Pharmaceuticals Genomics Proteomics Bioengineering Biotechnology Molecular design Imitating nature Designer organisms Renewable feedstocks Envirofriendly Manufacturing processes
MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: AMERICAN LIBRARY ASSOCIATION :: 30 JUNE 2008
Over 250 years of systematic description of life
Systema naturae (10th ed. 1758) by Carl von Linné
Taxonomic Literature
MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: AMERICAN LIBRARY ASSOCIATION :: 30 JUNE 2008
Taxonomic descriptions must be published for the name to be valid
Publications must be available to the public through trusted sources
Libraries have been the traditional place
Taxonomic Literature
MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: AMERICAN LIBRARY ASSOCIATION :: 30 JUNE 2008
The cited half-life of publications in taxonomy is longer than in any other scientific discipline
* * * The decay rate is longer than in any scientific discipline
~ Macro-economic case for open accessTom Moritz
Taxonomic Literature
MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: AMERICAN LIBRARY ASSOCIATION :: 30 JUNE 2008
The cultivation of natural science cannot be efficiently carried on without reference to an extensive library
Charles Darwin, et al. (1847)
Darwin, C. R. et al. 1847. Copy of Memorial to the First Lord of the Treasury [Lord John Russell], respecting the Management of the British Museum. Parliamentary Papers, Accounts and Papers 1847, paper number (268), volume XXXIV.253 (13 April): 1-3. [Complete Works of Charles Darwin Online]
MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: AMERICAN LIBRARY ASSOCIATION :: 30 JUNE 2008
The Taxonomic Impediment
“The taxonomic impediment is a term that describes the gaps of knowledge in our taxonomic system”
- Darwin Declaration, 1998
MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: AMERICAN LIBRARY ASSOCIATION :: 30 JUNE 2008
Taxonomic Impediment
Specimen collections Databases Publications Observations ‘Gray’ literature Index cards Field notebooks
MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: AMERICAN LIBRARY ASSOCIATION :: 30 JUNE 2008
The Taxonomic Impediment
• that there is access to information held in national/regional/global collections
• that electronic data is efficiently captured and provided in usable form
• that existing information held in literature and by current experts is made available electronically
• that stability of scientific names of organisms, used to access this information, is promoted
- Darwin Declaration, 1998
The essential requirements for accessing and utilizing this global information are:
MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: AMERICAN LIBRARY ASSOCIATION :: 30 JUNE 2008
Yet another physical difficulty is the task of assembling the library and indexes which will enable the student to work under proper conditions…. the beginner must now be prepared to spend liberally, or else must establish himself in an institution where a large library exists; if he work by himself with only a few books, he will have to confine himself to a very narrow specialty indeed.
'The Limitations of Taxonomy' by J.M. Aldrich, Science, April 22, 1927, vol. LXV, no. 1686, p.381
The Taxonomic Impediment
MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: AMERICAN LIBRARY ASSOCIATION :: 30 JUNE 2008
Biologia Centrali-Americana
0
1
2
3
4
5
6
7
8
US & Canada Europe Mexico & C.America
SouthAmerica
Biologia Centrali-AmericanaEdited by Frederick Ducane Godman and Osbert SalvinLondon : Pub. for the editors by R. H. Porter, 1879-1915
Chart showing distribution in public collections of the complete 63 volume sets held worldwide.2 complete copies in Central America held at the Smithsonian Tropical Research Institute Library
MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: AMERICAN LIBRARY ASSOCIATION :: 30 JUNE 2008
Digital Divide
MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: AMERICAN LIBRARY ASSOCIATION :: 30 JUNE 2008
Henry Walter BatesThe Naturalist on the River Amazons, 1863
Vishwas Chavan travels a lot. An informatician based at the National Chemical Laboratory in Pune, India, he collects data on what types of animal live where in India to enter into a biodiversity database … Much of the information Chavan seeks is in old, out-of-print tomes … To find them, Chavan has spent years trailing around libraries. He dreams of the day when books such as these are scanned and made available as digital files on the Internet.“Science in the Web Age: The Real Death of Print”
by Andreas von BubnoffNature 438, 550-552 1 December 2005
Digital Divide?
MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: AMERICAN LIBRARY ASSOCIATION :: 30 JUNE 2008
2003. Telluride. Encyclopedia of Life meeting
February 2005. London. Library and Laboratory: the Marriage of Research, Data and Taxonomic Literature
May 2005. Washington. Ground work for the Biodiversity Heritage Library
June 2006. Washington. Organizational and Technical meeting
August 2006. New York Botanical Garden. BHL Director’s Meeting.
October 2006. St. Louis/San Francisco. Technical meetings
February 2007. Museum of Comparative Zoology. Organizational meeting
May 2007. Encyclopedia of Life and BHL Portal Launch. Washington DC.
BHL Timeline
MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: AMERICAN LIBRARY ASSOCIATION :: 30 JUNE 2008
BHL MembersAmerican Museum of Natural History (New York)
Field Museum (Chicago)
Natural History Museum (London)
Smithsonian Institution Libraries (Washington)
Missouri Botanical Garden (St. Louis)
New York Botanical Garden (New York)
Royal Botanic Garden, Kew
Botany Libraries, Harvard University
Ernst Mayr Library of the Museum of Comparative Zoology, Harvard University
Marine Biological Laboratory / Woods Hole Oceanographic Institution
MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: AMERICAN LIBRARY ASSOCIATION :: 30 JUNE 2008
BHL Members
University of Illinois, Urbana-Champaign (contributing member)
Scheme for addition of European and Asian partners underway
Additional categories of membership under consideration
MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: AMERICAN LIBRARY ASSOCIATION :: 30 JUNE 2008
MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: AMERICAN LIBRARY ASSOCIATION :: 30 JUNE 2008
Encyclopedia of Life…imagine for a moment that all the diversity of the world were finally revealed and then described, say one page to a species. The description would contain the scientific name, a photograph or drawing, a brief diagnosis, and information of where the species if found. If published in conventional book form … this Great Encyclopedia of Life would occupy 60 meters of library shelf per million species … 100 million species of organisms … would extend through 6 kilometers of shelving …
E.O. Wilson (1992)
MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: AMERICAN LIBRARY ASSOCIATION :: 30 JUNE 2008
MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: AMERICAN LIBRARY ASSOCIATION :: 30 JUNE 2008
OH
O
H2N
OH
H
InformaticsMarine Biological Laboratory
Missouri Botanical Garden
Species Pages & SecretariatSmithsonian
Education and OutreachSmithsonian & Harvard
Synthesis CenterField Museum
MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: AMERICAN LIBRARY ASSOCIATION :: 30 JUNE 2008
Initial grant from the MacArthur and Sloan Foundations (as part of the Encyclopedia of Life grant)
Additional support from parent institutions
Additional grants being actively pursued by BHL and individual members
Funding
MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: AMERICAN LIBRARY ASSOCIATION :: 30 JUNE 2008
BHL Focus: Literature
MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: AMERICAN LIBRARY ASSOCIATION :: 30 JUNE 2008
BHL Focus: Literature
MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: AMERICAN LIBRARY ASSOCIATION :: 30 JUNE 2008
Mass Scanning
Mass scanning is a proven technology
Post processing of generated data proven, but evolving
MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: AMERICAN LIBRARY ASSOCIATION :: 30 JUNE 2008
The Internet Archive
• 501(c)(3) organization• Dedicated to “Universal Access to
Human Knowledge”• Founder of the Open Content Alliance• Provides:
– Mass scanning– Archival storage of files– Image processing– Technology development
MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: AMERICAN LIBRARY ASSOCIATION :: 30 JUNE 2008
Scribe Scanner
• Single Scribe Machine– Custom built by the
Internet Archive– Human operated– 3,500 page per shift per
day
MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: AMERICAN LIBRARY ASSOCIATION :: 30 JUNE 2008
BHL Scanning Centers
Northeast Regional Scanning Center 10 Scribe machines MBL/WHOI Harvard
New York Public Library 10 Scribe machines AMNH NYBG
MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: AMERICAN LIBRARY ASSOCIATION :: 30 JUNE 2008
BHL Scanning Centers
University of Illinois 2 Scribe machines
Natural History Museum, London 1 Scribe machine
Missouri Botanical Garden Non-Scribe operation
MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: AMERICAN LIBRARY ASSOCIATION :: 30 JUNE 2008
BHL Scanning Centers
Washington, DC 1 Scribe machine at
Smithsonian Libraries 10 Scribe facility at
Library of Congress with Fedlink (operational May2008)
MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: AMERICAN LIBRARY ASSOCIATION :: 30 JUNE 2008
Scanning Stats28 June 20086,153,568 pages15,343 volumes6,049 titles
MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: AMERICAN LIBRARY ASSOCIATION :: 30 JUNE 2008
Automate Discovery
Automated, scalable structural mark-up
Open to schemas for semantic mark-up
Integration of taxonomic intelligence
MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: AMERICAN LIBRARY ASSOCIATION :: 30 JUNE 2008
Structural Markup<article> <title>A BRIEF CONSIDERATION OF
CERTAIN POINTS IN THE MORPHOLOGY OFTHE FAMILY CHALCIDID^E.*.</title>
<author>L. O. HOWARD.</author> <volume>1</volume> <issue>2</issue> <start_page>65</start_page> <end_page>86</end_page> <start_count_page>85</start_count_page> <end_count_page>106</end_count_page>
<start_page_image_file>3908800908001101smthrich_0085.djvu</start_page_image_file>
<end_page_image_file>3908800908001101smthrich_0106.djvu</end_page_image_file>
</article>
MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: AMERICAN LIBRARY ASSOCIATION :: 30 JUNE 2008
Semantic Markup
GoldenGATEThe intention of the GoldenGATE editor is to build a bridge between NLP components and XML markup of natural language text according to arbitrary XML schemas. It allows the deployment of NLP components to marking up the bodies of literature they were designed for. In this way, it enables transforming the texts into XML content according to an XML schema that was designed to gain maximum benefit from the knowledge provided in them.
Integrated Open Taxonomic Access (INOTAXA)
MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: AMERICAN LIBRARY ASSOCIATION :: 30 JUNE 2008
10.7 million name strings in NameBank
Uses sophisticated algorithm (TaxonGrab) to locate likely name strings in OCR text
Iterative processing of BHL texts will both increase the number of name strings in NameBank and increase the accuracy of name string recognition
Taxonomic Intelligence
MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: AMERICAN LIBRARY ASSOCIATION :: 30 JUNE 2008
Build Content
MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: AMERICAN LIBRARY ASSOCIATION :: 30 JUNE 2008
Permissions
• Seek permissions from copyright holders
• Opt in Copyright Model: The BHL will actively work with professional societies and associations to integrate their publications into the BHL in a way that serves the societies’ missions and goals
• BHL will digitize learned society backfiles and mount them through the BHL Portal at no cost.
• Will provide a set of files to the publishers for reuse as they see fit
MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: AMERICAN LIBRARY ASSOCIATION :: 30 JUNE 2008
Successes
• Entomological News• Journal of Hymenoptera
Research
• Herpetological Review
• Publications of the San Diego Natural History Museum
• California Academy of Sciences publications
• And more ...
MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: AMERICAN LIBRARY ASSOCIATION :: 30 JUNE 2008
BHL Advantages• Use of the articles will increase
as evidenced by citation upsurge• Long-term management of the
digital assets is provided by the BHL at no cost
• Publishers’ content is embedded in the emerging knowledge ecology that is sweeping biology in this century
• Structural mark-up of backfiles into conformance with NLM DTD (just starting)
MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: AMERICAN LIBRARY ASSOCIATION :: 30 JUNE 2008
Serve Content
Machine to machine communication
Human interfaceable portal
Standard identifiers (proponent of the “yodi” - yet another digital identifier
???
MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: AMERICAN LIBRARY ASSOCIATION :: 30 JUNE 2008
MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: AMERICAN LIBRARY ASSOCIATION :: 30 JUNE 2008
Encyclopedia of Life
MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: AMERICAN LIBRARY ASSOCIATION :: 30 JUNE 2008
MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: AMERICAN LIBRARY ASSOCIATION :: 30 JUNE 2008
Persistent Identifiers Stable URL Handle DOI BICI/SICI ISSN ISBN LSIDs
http://www.biodiversitylibrary.org
MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: AMERICAN LIBRARY ASSOCIATION :: 30 JUNE 2008
BHL Portal• Library catalog-like interface
to BHL literature• Enhanced structural
analysis to provide volume/issue/article page access to the literature
• Iterative development based on feedback from user community
• Provide access to two key audiences:–Humans–Machines
MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: AMERICAN LIBRARY ASSOCIATION :: 30 JUNE 2008
Search Browse
MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: AMERICAN LIBRARY ASSOCIATION :: 30 JUNE 2008
Infrastructure: Now / Soon / Later
Now: Missouri Botanical Garden development site
Now: Storage; Internet Archive, Missouri Botanical Garden
Soon: Move to Fedora storage model
Later: Move to a distributed Fedora storage model
MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: AMERICAN LIBRARY ASSOCIATION :: 30 JUNE 2008
• Co-evolving bioinformatics resources produce a rich information ecology:
– Consortium for the Barcoding of Life (CBOL) with gene sequences deposited in GenBank.
– GBIF’s Electronic Catalog of Taxonomic Names
– Herbaria and museum specimen databases
Looking Forward
MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: AMERICAN LIBRARY ASSOCIATION :: 30 JUNE 2008
• Quick ramp-up high early costs – development, mass scanning, etc.
• Derive some long-term costs from the operating budgets of the member institutions (Examples under consideration: acquisitions budget, staff positions, etc.)
• Integrate functions/tasks with wider efforts where appropriate, e.g. mass storage
Looking Forward
MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: AMERICAN LIBRARY ASSOCIATION :: 30 JUNE 2008
Institutions that are creating the BHL exist to persist through time. That’s an important part of their business
The future is uncertain, the technology landscape changes, people pass on. So create consortial structures that are low-overhead, flexible, and can respond quickly
The Long Now Strategy
MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: AMERICAN LIBRARY ASSOCIATION :: 30 JUNE 2008
In any well-appointed Natural History Library there should be found every book and every edition of every book dealing in the remotest way with the subjects concerned.
Charles Davies Sherborn, Epilogue to Index Animalium,
March 1922
A Global Library for Life
MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: AMERICAN LIBRARY ASSOCIATION :: 30 JUNE 2008
MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: AMERICAN LIBRARY ASSOCIATION :: 30 JUNE 2008
Midrange estimate: 25% of 5 million species = 1.3 million species, or roughly 1 every 20 minutes
Low estimate: 15% of 4 million species = 0.6 million species, or roughly 1 every 44 minutes.
High estimate: 50% of 6 million species = 3 million species, or roughly 1 every 9 minutes
Conservation Internationalhttp://tinyurl.com/3hzkax
MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: AMERICAN LIBRARY ASSOCIATION :: 30 JUNE 2008
MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: AMERICAN LIBRARY ASSOCIATION :: 30 JUNE 2008
MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: AMERICAN LIBRARY ASSOCIATION :: 30 JUNE 2008
MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: AMERICAN LIBRARY ASSOCIATION :: 30 JUNE 2008
MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: AMERICAN LIBRARY ASSOCIATION :: 30 JUNE 2008
Thank You ... now, stick around for ... Suzanne
MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: AMERICAN LIBRARY ASSOCIATION :: 30 JUNE 2008
Biodiversity Heritage Libraryhttp://www.biodiversitylibrary.org/
Biodiversity Heritage Library Bloghttp://biodiversitylibrary.blogspot.com
Encyclopedia of Lifehttp://www.eol.org/
Smithsonian Institution Librarieshttp://www.sil.si.edu/
Universal Biological Indexer and Organizerhttp://www.ubio.org/
Biologia Centrali-Americana http://www.sil.si.edu/digitalcollections/bca/
LINKS
MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: AMERICAN LIBRARY ASSOCIATION :: 30 JUNE 2008
Thanks to:
Chris Freeland, Missouri Botanical Garden
Tom Garnett, The Biodiversity Heritage Library
The staff at the Internet Archive
Images from
The Galaxy of Images, Smithsonian Libraries (www.sil.si.edu/imagegalaxy)
Martin R. Kalfatovic
Suzanne C. Pilsk
Bernard Scaife
CREDITS