32
IFLA World Library and Information Congress August 12, 2008 Nancy Gwinn, Director, Smithsonian Institution Libraries [email protected] and Connie Rinaldo, Ernst Mayr Library, Harvard University [email protected]

Ifla Bhl080208cr

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Ifla Bhl080208cr

IFLA World Library and Information Congress August 12, 2008

Nancy Gwinn, Director, Smithsonian Institution Libraries [email protected] Rinaldo, Ernst Mayr Library, Harvard University [email protected]

Page 2: Ifla Bhl080208cr

IFLA World Library and Information Congress August 12, 2008

Overview Overview

How BHL beganHow BHL began PurposePurpose Encyclopedia of LifeEncyclopedia of Life Why now & Who is Why now & Who is involved?involved?

How is it being done?How is it being done? What makes this What makes this different?different?

Who else is involved?Who else is involved? Challenges aheadChallenges ahead

Page 3: Ifla Bhl080208cr

IFLA World Library and Information Congress August 12, 2008

What?What? MISSION MISSION

– Provide open access to Provide open access to biodiversity literature biodiversity literature for scientists, for scientists, researchers, students, researchers, students, and public world-wideand public world-wide

GOALSGOALS– Digitize the core Digitize the core published literature of published literature of biodiversity biodiversity

– Collaborate with the Collaborate with the global taxonomic global taxonomic community, rights community, rights holders and othersholders and others

“The cultivation of natural science cannot be efficiently carried on without reference to an extensive library.”

--C. Darwin et al 1847

Page 4: Ifla Bhl080208cr

IFLA World Library and Information Congress August 12, 2008

Why now?Why now? Biodiversity is HOTBiodiversity is HOT Taxonomic literature has Taxonomic literature has

extreme longevityextreme longevity Current taxonomic Current taxonomic

literature often relies literature often relies on texts and specimens > on texts and specimens > 100 years old.100 years old.

Tractable, well-defined Tractable, well-defined scientific domainscientific domain

Cost low – 10-19 cents a Cost low – 10-19 cents a page page

Supports GBIF and other Supports GBIF and other international initiativesinternational initiatives

Literature repatriationLiterature repatriation0

1

2

3

4

5

6

7

8

US & Canada Europe Mexico & C.America

SouthAmerica

Distribution of Biologica Centrali-Americana Courtesy, Martin Kalfatovic

Page 5: Ifla Bhl080208cr

IFLA World Library and Information Congress August 12, 2008

Encyclopedia of LifeEncyclopedia of Life

“The launch of the Encyclopedia of Life will have a profound and creative effect in science… this effort will lay out new directions for research in Every branch of biology”

E.O. Wilson

Page 6: Ifla Bhl080208cr

IFLA World Library and Information Congress August 12, 2008

Encyclopedia of LifeEncyclopedia of Life

EOL needed the EOL needed the literature literature underpinning in the underpinning in the BHL projectBHL project

EOL launched on 9EOL launched on 9thth May, 2007 May, 2007

BHL now key partner BHL now key partner in EOL projectin EOL project

Total funding will Total funding will reach at least reach at least $50M$50M

Page 7: Ifla Bhl080208cr

IFLA World Library and Information Congress August 12, 2008

How big is the Biodiversity How big is the Biodiversity domain?domain?

Over 5.4 million Over 5.4 million books dating back books dating back to 1469to 1469

800,000 800,000 monographsmonographs

40,000 journal 40,000 journal titles (12,500 titles (12,500 current)current)

50% pre-192350% pre-1923

Page 8: Ifla Bhl080208cr

IFLA World Library and Information Congress August 12, 2008

Taxonomic descriptions must be published for the name to be valid

Publications must be available to the public through trusted sources

Libraries have been the traditional place

Why is the literature so important?Why is the literature so important?

Page 9: Ifla Bhl080208cr

IFLA World Library and Information Congress August 12, 2008

The cited half-life of publications in taxonomy is longer than in any other scientific discipline

* * * The decay rate is longer than in any scientific discipline

Taxonomic LiteratureTaxonomic Literature

Page 10: Ifla Bhl080208cr

IFLA World Library and Information Congress August 12, 2008

WHO? Member PartnersWHO? Member PartnersMuseum librariesMuseum libraries American Museum of Natural American Museum of Natural

HistoryHistory Field Museum of Natural HistoryField Museum of Natural History Natural History Museum, LondonNatural History Museum, London Smithsonian Institution LibrariesSmithsonian Institution Libraries Ernst Mayr Library of the Museum Ernst Mayr Library of the Museum

of Comparative Zoology, Harvard of Comparative Zoology, Harvard UniversityUniversity

Botany librariesBotany libraries Missouri Botanical GardenMissouri Botanical Garden New York Botanical GardenNew York Botanical Garden Royal Botanic Garden, KewRoyal Botanic Garden, Kew Botany Libraries, Harvard Botany Libraries, Harvard

UniversityUniversity

Research institute libraryResearch institute library Marine Biological Laboratory / Marine Biological Laboratory /

Woods Hole Oceanographic Woods Hole Oceanographic InstitutionInstitution

Page 11: Ifla Bhl080208cr

IFLA World Library and Information Congress August 12, 2008

BHL CollectionsBHL Collections

• 1.3 million catalogue 1.3 million catalogue records records

• 73% are monographs 73% are monographs (remainder are (remainder are serials at title-serials at title-level) level)

• 63% is English 63% is English language materiallanguage material

• The next most popular The next most popular language (9%) is language (9%) is GermanGerman

• About 30% of material About 30% of material was published before was published before 19231923

Page 12: Ifla Bhl080208cr

August 12, 2008August 12, 2008 IFLA World Library and InformIFLA World Library and Information Congressation Congress

IFLA World Library and Information Congress August 12, 2008

The Internet ArchiveThe Internet Archive 501(c)(3) organization501(c)(3) organization Dedicated to “Universal Dedicated to “Universal Access to Human Access to Human Knowledge”Knowledge”

Founder of the Open Founder of the Open Content AllianceContent Alliance

Provides:Provides:– Mass scanningMass scanning– Archival storage of Archival storage of filesfiles

– Image processingImage processing– Technology developmentTechnology development

Page 13: Ifla Bhl080208cr

IFLA World Library and Information Congress August 12, 2008

Internet Archive Internet Archive scanning centers scanning centers in London, New in London, New York, DC, York, DC, Boston, Illinois Boston, Illinois

Image files/text Image files/text derived from OCRderived from OCR

HOW?HOW?

Page 14: Ifla Bhl080208cr

IFLA World Library and Information Congress August 12, 2008

ERNST MAYR LIBRARY STORY ERNST MAYR LIBRARY STORY (1)(1)

Workflow:Workflow:– Generate picklistsGenerate picklists– Identify acceptable items: size, Identify acceptable items: size, foldoutsfoldouts

– Avoid duplicationAvoid duplication– BarcodeBarcode– Generate packing list; check-out & Generate packing list; check-out & pack bookspack books

– Check-in, reshelve returnsCheck-in, reshelve returns

Page 15: Ifla Bhl080208cr

IFLA World Library and Information Congress August 12, 2008

ERNST MAYR LIBRARY STORY ERNST MAYR LIBRARY STORY (2)(2)

Serials bid list & Monograph Serials bid list & Monograph de-duping toolde-duping tool

OCLC Collection Analysis ToolOCLC Collection Analysis Tool Internet Archive provides image Internet Archive provides image files & text from Optical files & text from Optical Character Recognition (OCR) to Character Recognition (OCR) to BHL portalBHL portal

““Boutique” scanningBoutique” scanning 2.5 FTE devoted to project2.5 FTE devoted to project

Page 16: Ifla Bhl080208cr

IFLA World Library and Information Congress August 12, 2008

Page 17: Ifla Bhl080208cr

IFLA World Library and Information Congress August 12, 2008

What makes this project different ?

TAXONOMIC INTELLIGENCE

Page 18: Ifla Bhl080208cr

IFLA World Library and Information Congress August 12, 2008

“All accumulated information of a species is tied to a scientific name, a name that serves as a link between what has been learned in the past and what we today add to the body of knowledge.” ~ Grimaldi & Engel, 2005, Evolution

of the Insects

Page 19: Ifla Bhl080208cr

IFLA World Library and Information Congress August 12, 2008

Taxonomic intelligence is the inclusion of taxonomic practices, skills and knowledge within informatics services to manage information about organisms.

Established at the Marine Biological Laboratory/Woods Hole Oceanographic Institute

Page 20: Ifla Bhl080208cr

August 12, 2008August 12, 2008 IFLA World Library and InformIFLA World Library and Information Congressation Congress

IFLA World Library and Information Congress August 12, 2008

10.7 million name 10.7 million name strings in NameBankstrings in NameBank

Uses sophisticated Uses sophisticated algorithm to locate algorithm to locate likely name strings in likely name strings in OCR textOCR text

Processing of BHL texts Processing of BHL texts will both increase the will both increase the number of name strings number of name strings in NameBank and in NameBank and increase the accuracy increase the accuracy of name string of name string recognitionrecognition

Taxonomic IntelligenceTaxonomic Intelligence

Page 21: Ifla Bhl080208cr

IFLA World Library and Information Congress August 12, 2008

BHL PortalBHL Portalhttp://www.http://www.biodiversitylibrarybiodiversitylibrary.org.org

Portal developed at Missouri Portal developed at Missouri Botanical Garden Botanical Garden

Portal serves image & text files & Portal serves image & text files & uses a variety of tools to organize uses a variety of tools to organize the contentthe content

Persistent URLs allow linking at Persistent URLs allow linking at bibliographic record, volume, & page bibliographic record, volume, & page levels in BHLlevels in BHL & to other taxonomic & to other taxonomic servicesservices

Page 22: Ifla Bhl080208cr

IFLA World Library and Information Congress August 12, 2008

Page 23: Ifla Bhl080208cr

August 12, 2008August 12, 2008 IFLA World Library and InformIFLA World Library and Information Congressation Congress

IFLA World Library and Information Congress August 12, 2008

Page DeliveryPage Delivery

Page 24: Ifla Bhl080208cr

IFLA World Library and Information Congress August 12, 2008

Page 25: Ifla Bhl080208cr

August 12, 2008August 12, 2008 IFLA World Library and InformIFLA World Library and Information Congressation Congress

2525IFLA World Library and Information Congress August 12, 2008

Publishers & PermissionsPublishers & Permissions Seek permissions from Seek permissions from copyright holders of copyright holders of journals (49 so far)journals (49 so far)

Opt in Copyright ModelOpt in Copyright Model Will digitize learned Will digitize learned society backfiles and society backfiles and mount them through the mount them through the BHL Portal at no cost BHL Portal at no cost to the societyto the society

Will provide a set of Will provide a set of files to the files to the publishers for reuse publishers for reuse as they see fit as they see fit

Page 26: Ifla Bhl080208cr

August 12, 2008August 12, 2008 IFLA World Library and InformIFLA World Library and Information Congressation Congress

IFLA World Library and Information Congress August 12, 2008

So far, BHL has shownSo far, BHL has shown

• High levels of OCR accuracy in late 19th and 20th High levels of OCR accuracy in late 19th and 20th century printingcentury printing

• Taxonomic intelligence (species name finding) Taxonomic intelligence (species name finding) across millions of pages against nearly 11 million across millions of pages against nearly 11 million names in NameBank is highly effectivenames in NameBank is highly effective

• Administratively separate and geographically Administratively separate and geographically dispersed institutions can collaborate effectivelydispersed institutions can collaborate effectively

• Society journal publishers are enthusiastic about Society journal publishers are enthusiastic about participation in the BHL opt-in copyright model participation in the BHL opt-in copyright model

• The project has generated excitement in the The project has generated excitement in the international community and many opportunities to international community and many opportunities to develop new partnershipsdevelop new partnerships

• Ability to generate significant financial supportAbility to generate significant financial support

Page 27: Ifla Bhl080208cr

IFLA World Library and Information Congress August 12, 2008

FundingFunding

Initial $3 million from John D. Initial $3 million from John D. and Catherine T. MacArthur and Catherine T. MacArthur FoundationFoundation

Gordon Moore FoundationGordon Moore Foundation Individual members (Harvard, Individual members (Harvard, Smithsonian, NY Botanical Smithsonian, NY Botanical GardenGarden

Page 28: Ifla Bhl080208cr

IFLA World Library and Information Congress August 12, 2008

• Open Content AllianceOpen Content Alliance• International Commission International Commission

on Zoological on Zoological NomenclatureNomenclature

• European Distributed European Distributed Institute of TaxonomyInstitute of Taxonomy

• Global Biodiversity Global Biodiversity Information Facility Information Facility (GBIF)(GBIF)

• Atlas of Living Atlas of Living AustraliaAustralia

• BioOneBioOne• Chinese Academy of Chinese Academy of

SciencesSciences

Potential CollaboratorsPotential Collaborators

Page 29: Ifla Bhl080208cr

IFLA World Library and Information Congress August 12, 2008

Future DirectionsFuture Directions

Sustainable platformSustainable platform Ability to scan fold-outs, over-Ability to scan fold-outs, over-sized volumessized volumes

Time to access pages slowTime to access pages slow Mirror sitesMirror sites How to represent results to users?How to represent results to users?

– 6.7 million pages in BHL portal6.7 million pages in BHL portal– 14.7 mill. Name occurrences using Taxon Finder14.7 mill. Name occurrences using Taxon Finder– One search can yield 19,000 occurrences of One search can yield 19,000 occurrences of single namesingle name

Page 30: Ifla Bhl080208cr

IFLA World Library and Information Congress August 12, 2008

Future DirectionsFuture Directions Further develop global partnerships (BHL Further develop global partnerships (BHL Europe, e.g.) & add multiple languagesEurope, e.g.) & add multiple languages

Further develop partnerships with Further develop partnerships with publisherspublishers

Improved OCR Improved OCR Enhance connections with EOLEnhance connections with EOL Linkages to molecular, morphological & Linkages to molecular, morphological & other data typesother data types

Expand content analysis & tools to new Expand content analysis & tools to new audiencesaudiences

Grey Literature & archivesGrey Literature & archives Article-level analysis of serials using Article-level analysis of serials using automated toolsautomated tools

Page 31: Ifla Bhl080208cr

IFLA World Library and Information Congress August 12, 2008

Thanks to:

All BHL partners and staff whose slides and presentations we borrowed

Images from

The Galaxy of Images, Smithsonian Libraries (www.sil.si.edu/imagegalaxy)

CREDITSCREDITS

Page 32: Ifla Bhl080208cr

IFLA World Library and Information Congress August 12, 2008

Biodiversity Heritage Libraryhttp://www.biodiversitylibrary.org/

Biodiversity Heritage Library Bloghttp://biodiversitylibrary.blogspot.com

Encyclopedia of Lifehttp://www.eol.org/

Smithsonian Institution Librarieshttp://www.sil.si.edu/

Universal Biological Indexer and Organizerhttp://www.ubio.org/

Biologia Centrali-Americana http://www.sil.si.edu/digitalcollections/bca/

LINKSLINKS