Upload
bioinfocourse
View
469
Download
1
Tags:
Embed Size (px)
DESCRIPTION
David Remsen lecture on Tuesday, Sept 15, 2009, for the Biodiversity Informatics Course, a Swedish Taxonomy Initiative (Svenska Artprojektet) course at the Swedish Natural History Museum, Stockholm, supported by the Swedish Species Service (ArtDatabanken) and the Swedish GBIF node.
Citation preview
GLOBALBIODIVERSITYGLOBALBIODIVERSITYINFORMATIONFACILITYINFORMATIONFACILITY
David Remsen, Senior Programme David Remsen, Senior Programme Officer, GBIFOfficer, GBIF15 September 2009, Biodiversity 15 September 2009, Biodiversity InformaticsInformatics WWW.GBIF.O
RGWWW.GBIF.O
RG
Global Names ArchitectureGlobal Names ArchitectureA RationaleA RationaleBrief HistoryBrief HistoryComponentsComponents
Global Names ArchitectureGlobal Names ArchitectureA RationaleA RationaleBrief HistoryBrief HistoryComponentsComponents
All accumulated information of a species is tied to a scientific name, a name that serves as a link between what has been learned in the past and what we today add to the body of knowledge.
- Grimaldi & Engel, 2005, Evolution of the Insects
Biodiversity Information: A focus on taxaBiodiversity Information: A focus on taxa
Biodiversity Informatics: Creation, Curation, Discovery, Delivery of biodiversity informationBiodiversity Informatics: Creation, Curation, Discovery, Delivery of biodiversity information
A name that serves as a link to what has been learned in the past…A name that serves as a link to what has been learned in the past…
From T.E. Glover, The Fishes of Southwestern Japan, c.1870
A name that serves as a link to what has been learned in the past…A name that serves as a link to what has been learned in the past…
Unlike many other domains of science, historic publications have continued importance.
…and that we today add to the body of knowledge.…and that we today add to the body of knowledge.
From T.E. Glover, The Fishes of Southwestern Japan, c.1870
GBIF indexGBIF index
177 million records (> 5%/month)Gigabytes of text (~100 now)
All data mobilized through GBIFAll data mobilized through GBIF
Biodiversity InformationBiodiversity Information
Species information “tied” to scientific names
The “Names Problem”The “Names Problem”
Not Stable 5-10% names invalidated/decade
Not unique No complete list of names No complete list of species
No agreement on how many Even within a single group
Impacts discovery and access of information about species
The “Names Problem”The “Names Problem”
Properties of Names Orthographic (As labels of text that are “tied” to
information about species) Nomenclature (As the core “words” of taxonomy
that tie a name to a original publication and type) Taxonomy (As components of taxon definitions
derived via authoritative taxonomic rigor)
OrthographyOrthography
Orthography and the Names Problem
Objectives for Remediation
Variations in name spellingVariations in name spelling
Loligo pealeiiLoligo pealiiLoligo pealei
Some names are more hard to spell than othersSome names are more hard to spell than others
Actinobacillus actimomycetemcomitansActinobacillus actimycetemcomitansActinobacillus actinmycetemcomitansActinobacillus actinomicetemcomitansActinobacillus actinomyActinobacillus actinomyceActinobacillus actinomycemcomitansActinobacillus actinomyceremcomitansActinobacillus actinomycetamActinobacillus actinomycetamcomitansActinobacillus actinomycetecomitansActinobacillus actinomycetemcmitansActinobacillus actinomycetemcomintansActinobacillus actinomycetemcomitanceActinobacillus actinomycetemcomitansActinobacillus actinomycetemcomitants
Actinobacillus actinomycetemcommitansActinobacillus actinomycetemocimitansActinobacillus actinomycetencomitansActinobacillus actinomycetumActinobacillus actinomyctemcomitansActinobacillus actinomyectomcomitansActinobacillus actinomyetemcomitansActinobacillus actinonmycetemcomitansActinobacillus actionomycetemcomitansActinobacillus actynomicetemcomitansActinobacillus antinomycetemcomitans
• Difficulties with Latinized Names• Transcription errors
Which one is the correct one?Which one is the correct one?
Agalinus paupercula borealisAgalinus pauperculum borealisAgalinis paupercula var. BorealisAgalinus pauperculum var. borealisAgalinus paupercula var. borealisAgalinus paupercula var. borealis PennellAgalinus paupercula Britton var. borealis PennellAgalinus paupercula (Gray) Britt. var. borealis PennellAgalinis paupercula (A.Gray) Britton var. borealis PennellAgalinus paupercula (Gray) Britton var. borealis (Pennell) Zenkert 1934
Gerardia paupercula borealisGerardia paupercula var. borealisGerardia paupercula var. borealis (Pennell) DeamGerardia paupercula (Gray) Britt. var. borealis (Pennell) DeamGerardia paupercula (Gray) Britt. var. borealis (Pennell) DeamGerardia paupercula (A. Gray) Britton var. borealis (Pennell) Deam
Gerardia paupercula (A. Gray) Britton subsp. borealis (Pennell) PennellGerardia paupercula (Gray) Britt. ssp. borealis (Pennell) Pennell Gerardia paupercula Britton ssp. borealis Pennell
Many ways to correctly spell a nameMany ways to correctly spell a name
Should GBIF/EoL/BHL display all/one/some?Should GBIF/EoL/BHL display all/one/some?
ObjectivesObjectives
Informatics can contribute Index names occurring in content we wish to
publicise and access Develop tools to extract, catalog, and match
names. Reconcile names to authoritative names
sources via a common resolution path Reconcile name occurrence to taxonomic
concepts via a common concept resolution path
NomenclatureNomenclature
Nomenclatural aspects of the names problem.
Approaches for remediating them
Don’t pass on bad information.Don’t pass on bad information.
How can we determine the status of the names we discover in content that we serve?
How can we determine the status of the names we discover in content that we serve?
Nomenclatural changes impact search and retrievalNomenclatural changes impact search and retrieval
Where can I find out these names are related?Where can I find out these names are related?
Zoological Code doesn’t track recombinations
Botanical Code does.
Zoological Code doesn’t track recombinations
Botanical Code does.
Nomenclatural changes impact search and retrievalNomenclatural changes impact search and retrieval
HomonymsHomonyms
Peranema – the fern
Peranema – the euglenid
How many Peranema are there?
How can I tell them apart?
How many Peranema are there?
How can I tell them apart?
HomonymsHomonyms
Kingdom Phylum Class Order Family Genus
Plantae Magnoliophyta Magnoliopsida Apiales Umbelliferae Oenanthe
Plantae Oenanthe Oenanthe
Plantae Magnoliophyta Magnoliopsida Apiales Apiaceae Oenanthe
Plantae Orchidaceae Oenanthe
Animalia Chordata Aves Passeriformes Muscicapidae Oenanthe
Animalia Chordata Aves Passeriformes Turdidae Oenanthe
Animalia Chordata Actinopterygii Perciformes Pomatomidae Pomatomus
Animalia Chordata Pisces Perciformes Serranidae Pomatomus
Taxonomic context alone doesn’t tell me enough.
Approaches to remediationApproaches to remediation
Consolidate the major nomenclatural databases A single nomenclatural dictionary
Populate with provisionally verified records and enable open annotation
Provides nomenclatural status of a name Collectively identifies all homonyms. Identifiers used
in taxonomic data provide disambiguation context Ties all distinct nomenclatural combinations to the
original published name.
Informatics Promote global identifiers and simple resolution
pathway for these data
TaxonomyTaxonomy
Taxonomic Examples of the Names problem
Approaches for remediating them
Taxonomic synonymsTaxonomic synonyms
Halichondria panicea (Pallas 1776) sec Van Soest 2002 (WoRMS)
Consequences of SplittingConsequences of Splitting
Taxon Concept problem: What does someone mean when they refer to P. carinii
The Perils of LumpingThe Perils of Lumping
Bear Lodge meadow jumping mouse.Zaphus hudsonius campestris
Zaphus hudsonius preblei
INCLUDES
DOES NOT INCLUDE
Dr. Rob Roy Ramey says
Dr. Tim King says
Preble’s meadow jumping mouse.
What should a search for “Zaphus hudsonius campestris” return?
Different taxonomic views, different # species, different namesDifferent taxonomic views, different # species, different names
Taxonomic Backbones: Scope and completeness
Organisational value of Non-Taxonomic ListsOrganisational value of Non-Taxonomic Lists
Approaches to remediationApproaches to remediation
An inventory of different taxonomic catalogues Inform if there are concept issues for the
species Provide synonymised taxon concepts with
unique and resolvable identifiers Multiple classifications via checklists and
catalogues accessible and utilised as organisational frameworks for species information
SummarySummary
A data publication framework that enables A complete index of all names that are tied to
information about species Tools and infrastructure to support this.
A complete index of verified nomenclature and a identification and resolution system to make it easy to tie a name to an authoritative record.
A global taxonomic resolution system that allows a particular usage of a name to be tied to a defined taxon.
A system that puts taxonomy as a global organisational framework for species information.
Inventory and IndexInventory and Index
uBio IndexesuBio Indexes
Web Service outputs Taxon ObjectWeb Service outputs Taxon Object
Web Service calls from client applicationsWeb Service calls from client applications
Taxonomic organisation of contentTaxonomic organisation of content
Taxonomic organisation of contentTaxonomic organisation of content
Indexes support processes that support discoveryIndexes support processes that support discovery
That enable new and better tools and servicesThat enable new and better tools and services
Formalise the ArchitectureFormalise the Architecture
Coordinate Communities of InterestCoordinate Communities of Interest
Summary: GNA ObjectivesSummary: GNA Objectives
A complete index of names tied to information about species reconciled to a common and verified nomenclatural dictionary.
This same dictionary forms the basis for multiple expressions of taxonomic catalogues, regional checklists, and thematic lists of species.
These lists are openly accessible and tied to services and processes that enable them to be effectively employed in data organisation and retrieval.
Collectively, these components serve the delivery and utilisation of biological knowledge.
Thank youThank you
[email protected]:dremsen