41
GLOBAL BIODIVERSITY INFORMATION FACILITY David Remsen, Senior Programme David Remsen, Senior Programme Officer, GBIF Officer, GBIF 15 September 2009, Biodiversity 15 September 2009, Biodiversity Informatics Informatics WWW.GBIF.ORG Global Names Architecture Global Names Architecture A Rationale A Rationale Brief History Brief History Components Components

Remsen Lect04

Embed Size (px)

DESCRIPTION

David Remsen lecture on Tuesday, Sept 15, 2009, for the Biodiversity Informatics Course, a Swedish Taxonomy Initiative (Svenska Artprojektet) course at the Swedish Natural History Museum, Stockholm, supported by the Swedish Species Service (ArtDatabanken) and the Swedish GBIF node.

Citation preview

Page 1: Remsen Lect04

GLOBALBIODIVERSITYGLOBALBIODIVERSITYINFORMATIONFACILITYINFORMATIONFACILITY

David Remsen, Senior Programme David Remsen, Senior Programme Officer, GBIFOfficer, GBIF15 September 2009, Biodiversity 15 September 2009, Biodiversity InformaticsInformatics WWW.GBIF.O

RGWWW.GBIF.O

RG

Global Names ArchitectureGlobal Names ArchitectureA RationaleA RationaleBrief HistoryBrief HistoryComponentsComponents

Global Names ArchitectureGlobal Names ArchitectureA RationaleA RationaleBrief HistoryBrief HistoryComponentsComponents

Page 2: Remsen Lect04

All accumulated information of a species is tied to a scientific name, a name that serves as a link between what has been learned in the past and what we today add to the body of knowledge.

- Grimaldi & Engel, 2005, Evolution of the Insects

Biodiversity Information: A focus on taxaBiodiversity Information: A focus on taxa

Biodiversity Informatics: Creation, Curation, Discovery, Delivery of biodiversity informationBiodiversity Informatics: Creation, Curation, Discovery, Delivery of biodiversity information

Page 3: Remsen Lect04

A name that serves as a link to what has been learned in the past…A name that serves as a link to what has been learned in the past…

From T.E. Glover, The Fishes of Southwestern Japan, c.1870

Page 4: Remsen Lect04

A name that serves as a link to what has been learned in the past…A name that serves as a link to what has been learned in the past…

Unlike many other domains of science, historic publications have continued importance.

Page 5: Remsen Lect04

…and that we today add to the body of knowledge.…and that we today add to the body of knowledge.

From T.E. Glover, The Fishes of Southwestern Japan, c.1870

Page 6: Remsen Lect04

GBIF indexGBIF index

177 million records (> 5%/month)Gigabytes of text (~100 now)

All data mobilized through GBIFAll data mobilized through GBIF

Page 7: Remsen Lect04

Biodiversity InformationBiodiversity Information

Species information “tied” to scientific names

Page 8: Remsen Lect04

The “Names Problem”The “Names Problem”

Not Stable 5-10% names invalidated/decade

Not unique No complete list of names No complete list of species

No agreement on how many Even within a single group

Impacts discovery and access of information about species

Page 9: Remsen Lect04

The “Names Problem”The “Names Problem”

Properties of Names Orthographic (As labels of text that are “tied” to

information about species) Nomenclature (As the core “words” of taxonomy

that tie a name to a original publication and type) Taxonomy (As components of taxon definitions

derived via authoritative taxonomic rigor)

Page 10: Remsen Lect04

OrthographyOrthography

Orthography and the Names Problem

Objectives for Remediation

Page 11: Remsen Lect04

Variations in name spellingVariations in name spelling

Loligo pealeiiLoligo pealiiLoligo pealei

Page 12: Remsen Lect04

Some names are more hard to spell than othersSome names are more hard to spell than others

Actinobacillus actimomycetemcomitansActinobacillus actimycetemcomitansActinobacillus actinmycetemcomitansActinobacillus actinomicetemcomitansActinobacillus actinomyActinobacillus actinomyceActinobacillus actinomycemcomitansActinobacillus actinomyceremcomitansActinobacillus actinomycetamActinobacillus actinomycetamcomitansActinobacillus actinomycetecomitansActinobacillus actinomycetemcmitansActinobacillus actinomycetemcomintansActinobacillus actinomycetemcomitanceActinobacillus actinomycetemcomitansActinobacillus actinomycetemcomitants

Actinobacillus actinomycetemcommitansActinobacillus actinomycetemocimitansActinobacillus actinomycetencomitansActinobacillus actinomycetumActinobacillus actinomyctemcomitansActinobacillus actinomyectomcomitansActinobacillus actinomyetemcomitansActinobacillus actinonmycetemcomitansActinobacillus actionomycetemcomitansActinobacillus actynomicetemcomitansActinobacillus antinomycetemcomitans

• Difficulties with Latinized Names• Transcription errors

Which one is the correct one?Which one is the correct one?

Page 13: Remsen Lect04

Agalinus paupercula borealisAgalinus pauperculum borealisAgalinis paupercula var. BorealisAgalinus pauperculum var. borealisAgalinus paupercula var. borealisAgalinus paupercula var. borealis PennellAgalinus paupercula Britton var. borealis PennellAgalinus paupercula (Gray) Britt. var. borealis PennellAgalinis paupercula (A.Gray) Britton var. borealis PennellAgalinus paupercula (Gray) Britton var. borealis (Pennell) Zenkert 1934

Gerardia paupercula borealisGerardia paupercula var. borealisGerardia paupercula var. borealis (Pennell) DeamGerardia paupercula (Gray) Britt. var. borealis (Pennell) DeamGerardia paupercula (Gray) Britt. var. borealis (Pennell) DeamGerardia paupercula (A. Gray) Britton var. borealis (Pennell) Deam

Gerardia paupercula (A. Gray) Britton subsp. borealis (Pennell) PennellGerardia paupercula (Gray) Britt. ssp. borealis (Pennell) Pennell Gerardia paupercula Britton ssp. borealis Pennell

Many ways to correctly spell a nameMany ways to correctly spell a name

Should GBIF/EoL/BHL display all/one/some?Should GBIF/EoL/BHL display all/one/some?

Page 14: Remsen Lect04

ObjectivesObjectives

Informatics can contribute Index names occurring in content we wish to

publicise and access Develop tools to extract, catalog, and match

names. Reconcile names to authoritative names

sources via a common resolution path Reconcile name occurrence to taxonomic

concepts via a common concept resolution path

Page 15: Remsen Lect04

NomenclatureNomenclature

Nomenclatural aspects of the names problem.

Approaches for remediating them

Page 16: Remsen Lect04

Don’t pass on bad information.Don’t pass on bad information.

How can we determine the status of the names we discover in content that we serve?

How can we determine the status of the names we discover in content that we serve?

Page 17: Remsen Lect04

Nomenclatural changes impact search and retrievalNomenclatural changes impact search and retrieval

Where can I find out these names are related?Where can I find out these names are related?

Zoological Code doesn’t track recombinations

Botanical Code does.

Zoological Code doesn’t track recombinations

Botanical Code does.

Page 18: Remsen Lect04

Nomenclatural changes impact search and retrievalNomenclatural changes impact search and retrieval

Page 19: Remsen Lect04

HomonymsHomonyms

Peranema – the fern

Peranema – the euglenid

How many Peranema are there?

How can I tell them apart?

How many Peranema are there?

How can I tell them apart?

Page 20: Remsen Lect04

HomonymsHomonyms

Kingdom Phylum Class Order Family Genus

Plantae Magnoliophyta Magnoliopsida Apiales Umbelliferae Oenanthe

Plantae Oenanthe Oenanthe

Plantae Magnoliophyta Magnoliopsida Apiales Apiaceae Oenanthe

Plantae Orchidaceae Oenanthe

Animalia Chordata Aves Passeriformes Muscicapidae Oenanthe

Animalia Chordata Aves Passeriformes Turdidae Oenanthe

Animalia Chordata Actinopterygii Perciformes Pomatomidae Pomatomus

Animalia Chordata Pisces Perciformes Serranidae Pomatomus

Taxonomic context alone doesn’t tell me enough.

Page 21: Remsen Lect04

Approaches to remediationApproaches to remediation

Consolidate the major nomenclatural databases A single nomenclatural dictionary

Populate with provisionally verified records and enable open annotation

Provides nomenclatural status of a name Collectively identifies all homonyms. Identifiers used

in taxonomic data provide disambiguation context Ties all distinct nomenclatural combinations to the

original published name.

Informatics Promote global identifiers and simple resolution

pathway for these data

Page 22: Remsen Lect04

TaxonomyTaxonomy

Taxonomic Examples of the Names problem

Approaches for remediating them

Page 23: Remsen Lect04

Taxonomic synonymsTaxonomic synonyms

Halichondria panicea (Pallas 1776) sec Van Soest 2002 (WoRMS)

Page 24: Remsen Lect04

Consequences of SplittingConsequences of Splitting

Taxon Concept problem: What does someone mean when they refer to P. carinii

Page 25: Remsen Lect04

The Perils of LumpingThe Perils of Lumping

Bear Lodge meadow jumping mouse.Zaphus hudsonius campestris

Zaphus hudsonius preblei

INCLUDES

DOES NOT INCLUDE

Dr. Rob Roy Ramey says

Dr. Tim King says

Preble’s meadow jumping mouse.

What should a search for “Zaphus hudsonius campestris” return?

Page 26: Remsen Lect04

Different taxonomic views, different # species, different namesDifferent taxonomic views, different # species, different names

Taxonomic Backbones: Scope and completeness

Page 27: Remsen Lect04

Organisational value of Non-Taxonomic ListsOrganisational value of Non-Taxonomic Lists

Page 28: Remsen Lect04

Approaches to remediationApproaches to remediation

An inventory of different taxonomic catalogues Inform if there are concept issues for the

species Provide synonymised taxon concepts with

unique and resolvable identifiers Multiple classifications via checklists and

catalogues accessible and utilised as organisational frameworks for species information

Page 29: Remsen Lect04

SummarySummary

A data publication framework that enables A complete index of all names that are tied to

information about species Tools and infrastructure to support this.

A complete index of verified nomenclature and a identification and resolution system to make it easy to tie a name to an authoritative record.

A global taxonomic resolution system that allows a particular usage of a name to be tied to a defined taxon.

A system that puts taxonomy as a global organisational framework for species information.

Page 30: Remsen Lect04

Inventory and IndexInventory and Index

Page 31: Remsen Lect04

uBio IndexesuBio Indexes

Page 32: Remsen Lect04

Web Service outputs Taxon ObjectWeb Service outputs Taxon Object

Page 33: Remsen Lect04

Web Service calls from client applicationsWeb Service calls from client applications

Page 34: Remsen Lect04

Taxonomic organisation of contentTaxonomic organisation of content

Page 35: Remsen Lect04

Taxonomic organisation of contentTaxonomic organisation of content

Page 36: Remsen Lect04

Indexes support processes that support discoveryIndexes support processes that support discovery

Page 37: Remsen Lect04

That enable new and better tools and servicesThat enable new and better tools and services

Page 38: Remsen Lect04

Formalise the ArchitectureFormalise the Architecture

Page 39: Remsen Lect04

Coordinate Communities of InterestCoordinate Communities of Interest

Page 40: Remsen Lect04

Summary: GNA ObjectivesSummary: GNA Objectives

A complete index of names tied to information about species reconciled to a common and verified nomenclatural dictionary.

This same dictionary forms the basis for multiple expressions of taxonomic catalogues, regional checklists, and thematic lists of species.

These lists are openly accessible and tied to services and processes that enable them to be effectively employed in data organisation and retrieval.

Collectively, these components serve the delivery and utilisation of biological knowledge.

Page 41: Remsen Lect04

Thank youThank you

[email protected]:dremsen