46
Martin R. Kalfatovic Smithsonian Institution Libraries OCLC Digital Forum East 2009 5 November 2009 Arlington, VA An Inordinate Fondness for Data The Biodiversity Heritage Library

An Inordinate Fondness for Data: The Biodiversity Heritage Library

Embed Size (px)

DESCRIPTION

An Inordinate Fondness for Data: The Biodiversity Heritage Library. Martin R. Kalfatovic. OCLC Digital Forum East 2009. November 5, 2009. Arlington, VA.

Citation preview

Page 1: An Inordinate Fondness for Data: The Biodiversity Heritage Library

Martin R. KalfatovicSmithsonian Institution Libraries

OCLC Digital Forum East 20095 November 2009Arlington, VA

An Inordinate Fondness for DataThe Biodiversity Heritage Library

Page 2: An Inordinate Fondness for Data: The Biodiversity Heritage Library
Page 3: An Inordinate Fondness for Data: The Biodiversity Heritage Library

American Museum of Natural History (New York)

Academy of Natural Sciences Philadelphia

California Academy of Sciences (San Francisco)

Field Museum (Chicago)

Natural History Museum (London)

Smithsonian Institution Libraries (Washington)

Missouri Botanical Garden (St. Louis)

New York Botanical Garden (New York)

Royal Botanic Garden, Kew

Botany Libraries, Harvard University

Ernst Mayr Library of the Museum of Comparative Zoology, Harvard University

Marine Biological Laboratory / Woods Hole Oceanographic Institution

Page 4: An Inordinate Fondness for Data: The Biodiversity Heritage Library

TheEncyclopedia of Life

Page 5: An Inordinate Fondness for Data: The Biodiversity Heritage Library

H

InformaticsMarine Biological LaboratoryMissouri Botanical Garden

Species Pages & SecretariatSmithsonian

Education and OutreachSmithsonian & Harvard

Synthesis CenterField Museum

Page 6: An Inordinate Fondness for Data: The Biodiversity Heritage Library
Page 7: An Inordinate Fondness for Data: The Biodiversity Heritage Library
Page 8: An Inordinate Fondness for Data: The Biodiversity Heritage Library

How much is there:

Core literature pre-1923: 100 million pages (?)

All pre-1923: 120-150 million pages

All literature: 280-320 million pages

Page 9: An Inordinate Fondness for Data: The Biodiversity Heritage Library

• Northeast Regional Scanning Facility (Boston)

• Jersey City Facility• University of Illinois• Natural History

Museum, London• Missouri Botanical

Garden (Non-Scribe operation)

• Fedscan (Library of Congress)

• Smithsonian Libraries

Page 10: An Inordinate Fondness for Data: The Biodiversity Heritage Library

BHL Members: BHL-Europe• Museum für Naturkunde -

Leibniz-Institut für Evolutions- und Biodiversitätsforschung an der Humboldt-Universität zu Berlin

• Natural History Museum, UK• Narodni muzeum NMP CZ• Angewandte Informationstechnik

Forschungsgesellschaft mbH• Freie Universität Berlin

FUBBGBM• Georg-August-Universität

Göttingen Stiftung Öffentlichen Rechts

• Naturhistorisches Museum Wien• Hungarian Natural History

Museum• Museum and Institute of

Zoology, Polish Academy of Sciences

• University of Copenhagen

• Stichting Nationaal Natuurhistorisch Museum, Naturalis

• National Botanic Garden of Belgium

• Royal Museum for Central Africa,• Royal Belgian Institute of Natural

Sciences• Bibliothèque nationale de France• Museum national d’histoire

naturelle• Consejo Superior de

Investigaciones Cientificas• Università degli Studi di Firenze• Royal Botanic Garden,

Edinburgh• Species 2000• John Wiley & Sons limited• Helsingin yliopisto UH-Viikki

Page 11: An Inordinate Fondness for Data: The Biodiversity Heritage Library
Page 12: An Inordinate Fondness for Data: The Biodiversity Heritage Library

More than:40,000 volumes16 million pagesOnly 290 million to go!

Avg. monthly growth rate1,500 volumes 600,000 pagesSee you in 2048!

Now Online

Page 13: An Inordinate Fondness for Data: The Biodiversity Heritage Library

Ingest existing content12,000,000 pages+ from otherInternet Archive scanning partners

Page 14: An Inordinate Fondness for Data: The Biodiversity Heritage Library

Acquiring other content ...Researches scanning their own work or literature relevant to their work

Journals that have scanned their content, but do not have a robust platform to host it

Page 15: An Inordinate Fondness for Data: The Biodiversity Heritage Library

Biodiversity Heritage Library Permission ProcessWorking with non-profit publishers for sharing with the BHL

To digitize and mount works under copyright BHL must obtain permission from the copyright holders.

Many biodiversity journals and monographs are published by non-profit institutions or learned societies whose mission is to promote research and learning.

Some of these institutions have not sold their rights to commercial publishers and are open to sharing with the BHL.

Page 16: An Inordinate Fondness for Data: The Biodiversity Heritage Library

So what? Does [fill in blank] do that?

… and more and faster?

Page 17: An Inordinate Fondness for Data: The Biodiversity Heritage Library

So what? Does [fill in blank] do that?

… and more and faster?

Page 18: An Inordinate Fondness for Data: The Biodiversity Heritage Library

BHL is all about OPEN & SHARING

Page 19: An Inordinate Fondness for Data: The Biodiversity Heritage Library

Remind me again why?

Page 20: An Inordinate Fondness for Data: The Biodiversity Heritage Library

AccessPutting biodiversity literature in the hands of researchersSet the data freeSuck it; mash it; broadcast itIncreaseReuse, recyle, expand

An inordinate fondness for data

Page 21: An Inordinate Fondness for Data: The Biodiversity Heritage Library

Stats: Usage

• Jan – Sep 2009– 266,000 visitors– 436,000 visits– 2.1million

pageviews

• Daily average– 970 visitors– 1,600 visits / day– 7,700 pageviews /

dayJan – Sep 2009

Launch to 30 Sep 2009

Page 22: An Inordinate Fondness for Data: The Biodiversity Heritage Library

Global, coordinated development

New functionality from BHL-EuropeImproved deduplication toolsSemantic interfaceOAIS-compliant preservation infrastructure

Building a community of developersFunded & volunteerRubyBHL: http://github.com/mjy/rubyBHL

PyBHL: http://linux.softpedia.com/get/Programming/Libraries/pybhl-51612.shtml

New partners, new content

Page 23: An Inordinate Fondness for Data: The Biodiversity Heritage Library

Open Software & DevelopmentBHL Bits:

Portal code, utilities, serviceshttp://code.google.com/p/bhl-bits/

Taxonomic Literature GroupGoogle Group for discussion of “taxonomic literature &

the services required to make literature interoperable within biodiversity research and biodiversity informatics.”

http://groups.google.com/group/taxonlit

Page 24: An Inordinate Fondness for Data: The Biodiversity Heritage Library

Open Data

DownloadsSimple tab-delimited exports of core data http://www.biodiversitylibrary.org/data/BHLExportSchema.pdf

Data modelDB schema as ERD http://bhl-bits.googlecode.com/files/20090930_BHLDataModel.pdf

Page 25: An Inordinate Fondness for Data: The Biodiversity Heritage Library

Open Data

Page 26: An Inordinate Fondness for Data: The Biodiversity Heritage Library

Open Source Pageturning UI

http://github.com/openlibrary/bookreader

Page 27: An Inordinate Fondness for Data: The Biodiversity Heritage Library

Metadata: Feedback loop

Assigned to library staff for review & resolution

Page 28: An Inordinate Fondness for Data: The Biodiversity Heritage Library

ServicesNames Service

Return all occurrences of a name throughout BHL digitized corpus

Documentation: http://bit.ly/2e6sg9Access to 51million name strings using TaxonFinder

1.4million unique namesWorking out a strategy for obscure speciesAlgorithm improvements to detect nomenclatural & taxonomic

acts

OpenURLFacilitate links to citations: protologues, articles, references

Documentation: http://www.biodiversitylibrary.org/openurlhelp.aspxUseful to Nomenclators, Reference Systems

IPNITropicos

Page 30: An Inordinate Fondness for Data: The Biodiversity Heritage Library

Services: OpenURL Disambiguation

Looking for:

BHL returns:

Page 31: An Inordinate Fondness for Data: The Biodiversity Heritage Library

Services: OpenURL Results

Page 32: An Inordinate Fondness for Data: The Biodiversity Heritage Library

Encyclopedia of Life

522,000 species pages linked to BHL#1 referring site

Page 33: An Inordinate Fondness for Data: The Biodiversity Heritage Library
Page 34: An Inordinate Fondness for Data: The Biodiversity Heritage Library

Other Consumers

EarthCape LabsSort/Search capabilities with harvested namesYouTube demo: http://www.youtube.com/watch?v=qw7qw87JTOs

BioGUIDBHL Name Timeline

http://bioguid.info/bhl/

BHL Name Comparisonhttp://bioguid.info/bhl/compare.php

Page 35: An Inordinate Fondness for Data: The Biodiversity Heritage Library
Page 36: An Inordinate Fondness for Data: The Biodiversity Heritage Library
Page 37: An Inordinate Fondness for Data: The Biodiversity Heritage Library
Page 38: An Inordinate Fondness for Data: The Biodiversity Heritage Library
Page 39: An Inordinate Fondness for Data: The Biodiversity Heritage Library

Global BHLBased on open access

Open content

Collaboration

Shared development

Page 40: An Inordinate Fondness for Data: The Biodiversity Heritage Library

Uh, so what's it meanto me?1.9 million known species … most described once in a hard to find article … wouldn't it be nice to know more about your neighbors ...

Page 41: An Inordinate Fondness for Data: The Biodiversity Heritage Library
Page 42: An Inordinate Fondness for Data: The Biodiversity Heritage Library
Page 43: An Inordinate Fondness for Data: The Biodiversity Heritage Library
Page 44: An Inordinate Fondness for Data: The Biodiversity Heritage Library
Page 45: An Inordinate Fondness for Data: The Biodiversity Heritage Library

And thanks to ...

Page 46: An Inordinate Fondness for Data: The Biodiversity Heritage Library

Thanks for sticking around!