29
Data Cleaning and Data Publishing Workshop 2013 18-22 February, Nairobi, Kenya Javier Otegui @jotegui TAXONOMIC ASSESSMENTS

ASSESSMENTS-Taxonomic-Assessments-Javier

Embed Size (px)

Citation preview

Page 1: ASSESSMENTS-Taxonomic-Assessments-Javier

Data Cleaning and Data Publishing Workshop 2013 18-22 February, Nairobi, Kenya Javier Otegui @jotegui

TAXONOMIC ASSESSMENTS

Page 2: ASSESSMENTS-Taxonomic-Assessments-Javier

¡ What is Taxonomy? § CBD – “Taxonomy is the science of naming, describing and

classifying organisms and includes all plants, animals and microorganisms of the world”

§ Using morphological, behavioral, genetic and biochemical observations, taxonomists identify, describe and arrange species into classifications, including those that are new to science.

¡ Taxonomy is related to: §  the identification of an organism § Placing the organism in context with the rest of living

organisms

TAXONOMY – WHAT IS IT?

Page 3: ASSESSMENTS-Taxonomic-Assessments-Javier

¡ Taxonomy is based on names

¡ Humans have always given names

¡ Binomial nomenclature

¡ Define individuals and groups

¡ Each name defines a taxon

TAXONOMY – TAXONOMIC NAMES

Page 4: ASSESSMENTS-Taxonomic-Assessments-Javier

¡  Organization and classification of organisms

¡  According to common features

¡  Taxonomic classification

TAXONOMY - HIERARCHIES

http://wp.lps.org/jbenson2/blog/2012/01/18/january-18-taxonomy-chart-lab

Page 5: ASSESSMENTS-Taxonomic-Assessments-Javier

¡ Taxonomy has a strong subjective component

¡ Classifications depend on the expertise and point of

view of the specialist

¡ Lots of episodes of:

§ Name removals

§ Taxon splits

§ Taxon merges

§ Different organizations according to different features

¡ Some cases…

TAXONOMY – NAMES AND TAXONOMIES

Page 6: ASSESSMENTS-Taxonomic-Assessments-Javier

¡  Two different names are applied to the same organism ¡  Expert argues that two originally dif ferent taxa are the same ¡  Generally one name remains, the other is considered a

synonym and no longer valid

TAXONOMY - SYNONYMY

Photo: Arthur Chapman

Antilocapra americana Ord, 1815

Antilocapra anteflexa Gray, 1855

Page 7: ASSESSMENTS-Taxonomic-Assessments-Javier

¡  Two different names are applied to the same organism ¡  Expert argues that two originally dif ferent taxa are the same ¡  Generally one name remains, the other is considered a

synonym and no longer valid

TAXONOMY - SYNONYMY

Photo: Arthur Chapman

Antilocapra americana Ord, 1815

Antilocapra anteflexa Gray, 1855

Page 8: ASSESSMENTS-Taxonomic-Assessments-Javier

¡  The same name is applied to two different organisms ¡  New description using “already taken” name ¡  Generally, oldest name prevails and newest has to change

TAXONOMY - HOMONYMY

Echidna Cuvier, 1797

Echidna Forster, 1777

Photo: David R

Photo: Petr Baum

Page 9: ASSESSMENTS-Taxonomic-Assessments-Javier

Photo: David R

Photo: Petr Baum

¡  The same name is applied to two different organisms ¡  New description using “already taken” name ¡  Generally, oldest name prevails and newest has to change

TAXONOMY - HOMONYMY

Echidna Cuvier, 1797

Echidna Forster, 1777

Page 10: ASSESSMENTS-Taxonomic-Assessments-Javier

Photo: Petr Baum

¡  The same name is applied to two different organisms ¡  New description using “already taken” name ¡  Generally, oldest name prevails and newest has to change

TAXONOMY - HOMONYMY

Echidna Cuvier, 1797

Tachyglossus Illiger, 1811

Page 11: ASSESSMENTS-Taxonomic-Assessments-Javier

¡ Taxonomic classifications are subjective

¡ Based on common features

¡ Different experts select different features

¡ Scientific names might remain the same

¡ Higher level taxa or groups might differ

¡ See example…

TAXONOMY – ALTERNATE CLASSIFICATIONS

Page 12: ASSESSMENTS-Taxonomic-Assessments-Javier

TAXONOMY – ALTERNATE CLASSIFICATIONS

Page 13: ASSESSMENTS-Taxonomic-Assessments-Javier

¡ Issues with names hamper the use of taxonomic names alone to be effective

¡ New term: Taxon concept ¡ Name – Concatenation of characters ¡ Concept – Name + context ¡ Even if the name is the same, the

concept is different since it applies to different organisms

TAXONOMY – NAME VS CONCEPT

Page 14: ASSESSMENTS-Taxonomic-Assessments-Javier

TAXONOMY - STANDARDS

¡  Taxonomic names: Scientific name and all higher taxa ¡  Taxon concept: taxonConceptID, nameAccordingTo,

namePublishedIn…

Page 15: ASSESSMENTS-Taxonomic-Assessments-Javier

TAXONOMY - STANDARDS

¡  Taxonomic names: Scientific name and all higher taxa ¡  Taxon concept: taxonConceptID, nameAccordingTo,

namePublishedIn…

Source in which the specific taxon concept circumscription is defined or implied

Page 16: ASSESSMENTS-Taxonomic-Assessments-Javier

TAXONOMY - STANDARDS

¡  Taxonomic names: Scientific name and all higher taxa ¡  Taxon concept: taxonConceptID, nameAccordingTo,

namePublishedIn…

For taxa that result from identifications, a reference to the keys, monographs, experts and other sources

should be given

Page 17: ASSESSMENTS-Taxonomic-Assessments-Javier

¡ One of the most common issues

¡ Random alteration of one or more characters in a

name

¡ Possibilities:

§ Purely accidental

§ Due to low knowledge

¡ Tend to appear at the time of digitization

NOISE - MISSPELLINGS

Page 18: ASSESSMENTS-Taxonomic-Assessments-Javier

NOISE - MISSPELLINGS

Photo: Barracuda1983

Pipistrellus

Pipistrelus Pippistrellus

Pipistrella Pippistrela …

Page 19: ASSESSMENTS-Taxonomic-Assessments-Javier

¡ Misidentification § A more obscure type of error § Wrongly identify a taxon § The only way of solving is through close examination by

expert taxonomist § Might not be resolvable at all

¡ Emptiness § Seriousness depends on missing level/s § Importance decreases as taxonomic rank increases § Scientific name missing? § Special cases: homonymies, synonymies…

NOISE – MISIDENTIFICATIONS & EMPTINESS

Page 20: ASSESSMENTS-Taxonomic-Assessments-Javier

¡ Not defining used taxonomy § Can have the same effect as having only scientific name § We might complete hierarchy, but reliability? § Providing employed taxonomy (taxonomic concept) § Use identification qualifiers: “Sensu Otegui, 2013”, or “Sensu

Biologia Centrali Americana”

¡ Synonymies and homonymies § Again, background information (metadata, taxonomic concept)

needed § Use of identification qualifiers

NOISE – NATURE OF TAXONOMY

Page 21: ASSESSMENTS-Taxonomic-Assessments-Javier

¡  Instability of taxonomic identifications ¡  Background information greatly help ¡  Also having source of change records

NOISE – NATURE OF TAXONOMY

Page 22: ASSESSMENTS-Taxonomic-Assessments-Javier

¡  Aims of taxonomic assessments §  Correct issues §  Reconcile taxonomies §  Complete hierarchies

¡  Basic general process – controlled name list §  Take a name §  Check if exists in a reliable list of names §  Extract related information §  Apply to our dataset

ASSESSMENTS

Page 23: ASSESSMENTS-Taxonomic-Assessments-Javier

¡  General Databases §  Ideally, global high-quality information §  Not complete §  Rely on taxon-specific sources and their completeness

ASSESSMENTS – SOURCES OF DATA

Page 24: ASSESSMENTS-Taxonomic-Assessments-Javier

¡  General Databases §  Ideally, global high-quality information §  Not complete §  Rely on taxon-specific sources and their completeness

¡  Thematic databases and regional checklists §  If our collection is taxon-specific or location-specific §  Gather all available knowledge on their topic §  Reliable authoritative sources

ASSESSMENTS – SOURCES OF DATA

Page 25: ASSESSMENTS-Taxonomic-Assessments-Javier

¡  General Databases §  Ideally, global high-quality information §  Not complete §  Rely on taxon-specific sources and their completeness

¡  Thematic databases and regional checklists §  If our collection is taxon-specific or location-specific §  Gather all available knowledge on their topic §  Reliable authoritative sources

¡  Taxonomic Literature § Most specific source §  Very high reliability §  Hard to retrieve relevant literature §  Some processing needed

ASSESSMENTS – SOURCES OF DATA

Page 26: ASSESSMENTS-Taxonomic-Assessments-Javier

¡ Free of misspellings § Ab initio, or manage to reduce to the minimum § Some of the tools (Refine, Excel processing…) to accomplish

this § Taxonomic reconciliation depends on this requirement

¡ Completeness § At least to certain point § This minimum is scientific name § But only scientific name might not be enough

¡ Helpful metadata § Not related to the organism, but to the process of identification § The person who identified, taxonomic classification

ASSESSMENTS - REQUIREMENTS

Page 27: ASSESSMENTS-Taxonomic-Assessments-Javier

¡ Manual §  Removing inconsistencies, updating the wrong information §  Taxonomy is an interpretation of explicit and implicit knowledge §  Explicit knowledge – records §  Implicit knowledge – human deduction § Machines are not good at interpreting implicit knowledge §  Prone to errors. Automated approach recommended

¡  Automatic §  Big amounts of data §  Repetitive tasks §  Removal of misspellings, checking against source, update §  Only explicit knowledge. Explicit metadata mandatory

ASSESSMENTS - METHODS

Page 28: ASSESSMENTS-Taxonomic-Assessments-Javier

ASSESSMENTS - SEQUENCE

Page 29: ASSESSMENTS-Taxonomic-Assessments-Javier

¡  After cleaning, validate output ¡  Check:

§  The data that has been corrected §  The data that could not be corrected §  The data that might have gone worse

¡  Taxonomic validation: §  Expertise § Mixture of explicit and implicit knowledge §  Not completely automatable

¡  If assessments fail: §  Our data – Document and report reliability §  Distributed data – Flag and report

VALIDATION