42
Concepts and Tools Needed to Increase Bottom-Up Taxonomic Expert Participation in a Global Names-Based Infrastructure Nico Franz, David Patterson, Sudhir Kumar & Edward Gilbert School of Life Sciences, Arizona State University TDWD 2013 Annual Conference, Florence, Italy Developing a Names-Based Architecture for Linking Biodiversity Data October 31, 2013 Slides @ http://taxonbytes.org/tdwg-2013-concepts-and-tools-needed-for-taxonomic-expert-participation-in-a-global-names-based-infrastructure/

Franz Et Al - Concepts and Tools Needed to Increase Bottom-Up Taxonomic Expert Participation in a Global Names-Based Infrastructure

Embed Size (px)

DESCRIPTION

We discuss the perceived requirements – conceptual, technical, and social – for the creation of a “Taxonomic Clearing House” (TCH) that will enfranchise and enhance contributions by individual taxonomic experts and collaboratives in a global, names-based infrastructure. In terms of scale, such an infrastructure must be suited to assemble, retrieve, and editing contemporary taxonomic and phylogenetic classifications that involve some 22 million name strings representing 2.3 million living and extinct species; and serve diverse contributor and user communities including 6-40 thousand experts, 400,000 biologists, and more than 100 million citizen scientists. Existing classification synthesis platforms fall short of this grand challenge because they (1) may be limited to living or fossil taxa, (2) fail to show alternative points of view or (3) integrate molecularly-defined entities (“dark taxa”), (4) do not automatically monitor new data, (5) lack scalable solutions for providing feedback and credit, (6) have slow revisionary processes, (7) lack effective machine-to-machine services, or (8) cannot represent finer-grained insights such as evolving taxonomic concepts. Jointly these factors can produce a disconnect of the expert community that leads the global, piece-meal process of advancing classifications from large-scale platforms that purport to represent and unify their individual contributions. A suitable TCH should counteract this by acting as an open communal environment allowing expert contributors to jointly assemble and edit evolving taxonomic and phylogenetic content leading to large-scale classifications. In particular, it must (1) engage major collaborating taxonomic ad phylogenetic initiatives and facilitate diverse information flow; (2) expand information acquisition capabilities to harvest names and classifications from diverse sources; (3) create a powerful interface for taxonomic editing, including a topology assembly and visualization layer, nomenclatural and taxonomic editing layers, a Filtered Push-based service (http://wiki.filteredpush.org/wiki/) for submitting, tracking and accrediting edits to expert contributors, and taxonomically intelligent alerts; and (4) leverage these efforts towards a “Union” reference classification holding two million taxa and multiple alternative perspectives as indicated. To promote the engagement and acceptance, a TCH should target existing expert communities such as contributor to the Symbiota collections or TimeTree phylogenetics platforms. The presentation will both introduce the elements of this TCH vision and assess their merits and current progress and challenges towards realization.

Citation preview

Page 1: Franz Et Al - Concepts and Tools Needed to Increase Bottom-Up Taxonomic Expert Participation in a Global Names-Based Infrastructure

Concepts and Tools Needed to Increase

Bottom-Up Taxonomic Expert Participation

in a Global Names-Based Infrastructure

Nico Franz, David Patterson, Sudhir Kumar & Edward Gilbert

School of Life Sciences, Arizona State University

TDWD 2013 Annual Conference, Florence, Italy

Developing a Names-Based Architecture for Linking Biodiversity Data

October 31, 2013

Slides @ http://taxonbytes.org/tdwg-2013-concepts-and-tools-needed-for-taxonomic-expert-participation-in-a-global-names-based-infrastructure/

Page 2: Franz Et Al - Concepts and Tools Needed to Increase Bottom-Up Taxonomic Expert Participation in a Global Names-Based Infrastructure

Arizona State University's current GN involvement

Biodiversity Informatics @ ASUhttp://taxonbytes.org/informatics

http://globalnames.fulton.asu.edu

http://www.globalnames.org/

http://pinkava.asu.edu/starcentral/custar/portal.php

Page 3: Franz Et Al - Concepts and Tools Needed to Increase Bottom-Up Taxonomic Expert Participation in a Global Names-Based Infrastructure

Biodiversity Informatics @ ASUhttp://taxonbytes.org/informatics

http://globalnames.fulton.asu.edu

http://www.globalnames.org/

http://pinkava.asu.edu/starcentral/custar/portal.php

Arizona State University's current GN involvement

Concept/proposal of a GN

Taxonomic Clearing House (TCH)

Page 4: Franz Et Al - Concepts and Tools Needed to Increase Bottom-Up Taxonomic Expert Participation in a Global Names-Based Infrastructure

• "My belief is that the taxonomic community feels *disenfranchised* – in various ways –and we MUST change that, in a tangible manner. [The Commissioners] do whatever wecan to interact with the broader community […] to help demystify the Code andimprove the perception of the Commission."

• "My own personal vision is far more than that, however: I am convinced that we have aculture of taxonomists many of whom do not understand the Code, or outright opposeit (or parts thereof, such as gender agreement), and that the BEST way to get them tocare about the Code is to give them an actual voice. In effect, we need to deputizethem – offer a role in which every taxonomist is given a measure of authority, ofcontrol."

• "Not replacing ALL of the functions and duties of the Commission, but redesigning theprocess so each and every taxonomist has a direct, personal stake in the enterprise (tothe extent that they choose to exercise that privilege)."

– ICZN Commissioner, 2013 (to D. Patterson)

TCH concept supposes that one can:

• Replace Code with major aggregator project or perspective (such as CoL), and

• Replace Commission with project leadership,

and retain a sense of truth. Hence – empower individual experts.

Two motivating quotes – 1. Counteract disenfranchisement

Page 5: Franz Et Al - Concepts and Tools Needed to Increase Bottom-Up Taxonomic Expert Participation in a Global Names-Based Infrastructure

Two motivating quotes – 2. Build for the taxonomic process

• "There is a shared awareness among taxonomists that "outside communities" wouldlike usable, precise classifications to apply to their research challenges. However thisreasonable demand is not the same as asking for a single, semi-arbitrarily flattenedview that does not actually represent the underlying complexities."

• "Many taxonomy users are aware that their current system in use is ephemeral. Thereare valid pressures to improve long-term data integration, and *that* is what manyusers will value over having a single system."

• "Mandating a single view should never work as something that can fairly represent andattract taxonomic research and progress. […] These are in my view worthwhilechallenges that address the demands for representing taxonomic discourse andprogress as well as serving the user communities with better integrated and lessephemeral products."

– NMF, Aug. 2013, on Taxacom ("Global Species Lists and Taxonomy" thread)

TCH concept includes a taxonomic editing layer ("GNITE") that supports:

• Multiple, partial, alternative classifications and phylogenies (a.k.a. "the process");

• Concepts, relationships, and visualizations of given/inferred concept provenance.

Hence – prepare for concept-level semantics, services.

Page 6: Franz Et Al - Concepts and Tools Needed to Increase Bottom-Up Taxonomic Expert Participation in a Global Names-Based Infrastructure

Two hurdles to a GN concept-level platform1

1 Not exhaustive, or even very fair to people and projects who have dealt with these "hurdles" and have overcome them.

• "What is a concept? Nobody really understands this."

• "What about concept inflation? This is not scalable."

A way to address: promote semantic, social practices that minimize pitfalls.

DOI:10.1080/14772000.2013.806371 (link)

Page 7: Franz Et Al - Concepts and Tools Needed to Increase Bottom-Up Taxonomic Expert Participation in a Global Names-Based Infrastructure

What is a concept? – shallow, technical

Source: http://code.google.com/p/darwin-sw/

• Name /Authority works as a most context-neutral(or -vacuous) definition.

• Practical situations facilitate different inference abilities once context is given.

Page 8: Franz Et Al - Concepts and Tools Needed to Increase Bottom-Up Taxonomic Expert Participation in a Global Names-Based Infrastructure

• "The soundest motivation for using taxonomic concepts in biology is not merely

to improve data management (Berendsohn, 1995) but to increase the

semantic precision of taxonomic names (Franz et al., 2008)."

• "We suggest that this approach should be pursued if and where the (not

inconsiderable) cost of doing so is offset by yielding better integration of

taxonomically labeled biological information, and therefore better biological

inferences."

– Franz & Cardona-Duque, 2013

Deeper issues – why bother?

Page 9: Franz Et Al - Concepts and Tools Needed to Increase Bottom-Up Taxonomic Expert Participation in a Global Names-Based Infrastructure

• "Whenever a name appears in subsequent paragraphs, we transparently signal either:

(1) that this usage refers to a single and specific previous or current concept of thatname (sec.); or

Think: intended ability to contribute to SW-type reasoning

Perelleschus O'Brien & Wibmer sec. Franz & O'Brien 2001

Page 10: Franz Et Al - Concepts and Tools Needed to Increase Bottom-Up Taxonomic Expert Participation in a Global Names-Based Infrastructure

• "Whenever a name appears in subsequent paragraphs, we transparently signal either:

(1) that this usage refers to a single and specific previous or current concept of thatname (sec.); or

(2) that it refers more vaguely to the cumulative history of concepts associated withthat name (no additional labeling); or

Think: intended ability to contribute to SW-type reasoning

Perelleschus O'Brien & Wibmer sec. Franz & O'Brien 2001

Perelleschus O'Brien & Wibmer

Page 11: Franz Et Al - Concepts and Tools Needed to Increase Bottom-Up Taxonomic Expert Participation in a Global Names-Based Infrastructure

• "Whenever a name appears in subsequent paragraphs, we transparently signal either:

(1) that this usage refers to a single and specific previous or current concept of thatname (sec.); or

(2) that it refers more vaguely to the cumulative history of concepts associated withthat name (no additional labeling); or

(3) that we utilize this name in an even more non-committal sense (non-focal), typically as a semantic crutch to help contextualize names whose meaningswe actually intend to focus on.

Think: intended ability to contribute to SW-type reasoning

Perelleschus O'Brien & Wibmer sec. Franz & O'Brien 2001

Perelleschus O'Brien & Wibmer

Ganglionus O'Brien & Wibmer [non-focal]

Page 12: Franz Et Al - Concepts and Tools Needed to Increase Bottom-Up Taxonomic Expert Participation in a Global Names-Based Infrastructure

• "Whenever a name appears in subsequent paragraphs, we transparently signal either:

(1) that this usage refers to a single and specific previous or current concept of thatname (sec.); or

(2) that it refers more vaguely to the cumulative history of concepts associated withthat name (no additional labeling); or

(3) that we utilize this name in an even more non-committal sense (non-focal), typically as a semantic crutch to help contextualize names whose meaningswe actually intend to focus on.

• By consistently specifying the nomenclatural and/or taxonomic context in which namesare used (or the inverse), and what expectations towards our readership areimplied, we are a step closer to achieving a machine-interpretable annotation of theseusages.

– Franz & Cardona-Duque, 2013

Think: intended ability to contribute to SW-type reasoning

Perelleschus O'Brien & Wibmer sec. Franz & O'Brien 2001

Perelleschus O'Brien & Wibmer

Ganglionus O'Brien & Wibmer [non-focal]

Page 13: Franz Et Al - Concepts and Tools Needed to Increase Bottom-Up Taxonomic Expert Participation in a Global Names-Based Infrastructure

• "Whenever a name appears in subsequent paragraphs, we transparently signal either:

(1) that this usage refers to a single and specific previous or current concept of thatname (sec.); or

(2) that it refers more vaguely to the cumulative history of concepts associated withthat name (no additional labeling); or

(3) that we utilize this name in an even more non-committal sense (non-focal), typically as a semantic crutch to help contextualize names whose meaningswe actually intend to focus on.

• By consistently specifying the nomenclatural and/or taxonomic context in which namesare used (or the inverse), and what expectations towards our readership areimplied, we are a step closer to achieving a machine-interpretable annotation of theseusages.

– Franz & Cardona-Duque, 2013

What are speakers expecting from their (machine, KRR) audience?

Perelleschus O'Brien & Wibmer sec. Franz & O'Brien 2001

Perelleschus O'Brien & Wibmer

Ganglionus O'Brien & Wibmer [non-focal]

Heavy duty semantic reasoning, precise

Some reasoning, gets worse as time increases

More limited to no reasoning expectation

Page 14: Franz Et Al - Concepts and Tools Needed to Increase Bottom-Up Taxonomic Expert Participation in a Global Names-Based Infrastructure

Putting concepts, names, [non-focal]to use in a new classification

Page 15: Franz Et Al - Concepts and Tools Needed to Increase Bottom-Up Taxonomic Expert Participation in a Global Names-Based Infrastructure

With conventions in place, we can compartmentalize & innovate

• Perelleschus (2013) revision combines name/concept taxonomy organically

(1) Concept

(2) Name

(3) Non-focal

• Concept taxonomy "cuts through"any separation of classification vs.phylogeny; though outgroups maybe viewed as [non-focal].

Page 16: Franz Et Al - Concepts and Tools Needed to Increase Bottom-Up Taxonomic Expert Participation in a Global Names-Based Infrastructure

Consistency – maximize concepts when possible, minimize names

Key to species-level concepts, old & new names

Distributionmap

Figureshowing

specimens,traits

New species, diagnosis

Page 17: Franz Et Al - Concepts and Tools Needed to Increase Bottom-Up Taxonomic Expert Participation in a Global Names-Based Infrastructure

Consistency – maximize concepts when possible, minimize names

Key to species-level concepts, old & new names

Distributionmap

Figureshowing

specimens,traits

New species, diagnosis

Names are essentially restricted to Introduction/Discussion, i.e.when the entire taxonomic history related to a name is referred to.

As an expert aware of context at all times, I can almost omit them.(not so with [non-focal] cases which are needed).

Page 18: Franz Et Al - Concepts and Tools Needed to Increase Bottom-Up Taxonomic Expert Participation in a Global Names-Based Infrastructure

Concepts for ranked Linnaean names, focal & non-focal clades

Phylogenetic characters,concepts for clades

Phylogenetic character matrix

Phylogenetic tree

Page 19: Franz Et Al - Concepts and Tools Needed to Increase Bottom-Up Taxonomic Expert Participation in a Global Names-Based Infrastructure

Historically endorsed concepts are readily flagged as such

• Revision includes complete circumscriptions for 54 related concepts, 1936-2013

Page 20: Franz Et Al - Concepts and Tools Needed to Increase Bottom-Up Taxonomic Expert Participation in a Global Names-Based Infrastructure

1986 1986

Represent all pertinent prior & current classifications & phylogenies

1936 1954

2006 2013

= "carludovicae" (name),cumulative history

Page 21: Franz Et Al - Concepts and Tools Needed to Increase Bottom-Up Taxonomic Expert Participation in a Global Names-Based Infrastructure

Reasoning over concept evolution

Page 22: Franz Et Al - Concepts and Tools Needed to Increase Bottom-Up Taxonomic Expert Participation in a Global Names-Based Infrastructure

Get ready for taxonomic KRR, I: identifying individual concepts

• Name Perelleschus contributes to 5 concepts; sec. 1954, 1986, 2001, 2006, 2013

Page 23: Franz Et Al - Concepts and Tools Needed to Increase Bottom-Up Taxonomic Expert Participation in a Global Names-Based Infrastructure

Get ready for taxonomic KRR, II: assemble classifications (P/C)

Page 24: Franz Et Al - Concepts and Tools Needed to Increase Bottom-Up Taxonomic Expert Participation in a Global Names-Based Infrastructure

Get ready for taxonomic KRR, III: express concept articulations

• Articulations use Franz & Peet (2009)1 terms which significantly improve upon TDWG-TCS

1 Franz & Peet. 2009. Towards a language for mapping relationships among taxonomic concepts. Systematics and Biodiversity 7: 5-20. Link

Page 25: Franz Et Al - Concepts and Tools Needed to Increase Bottom-Up Taxonomic Expert Participation in a Global Names-Based Infrastructure

Concept resolution and merge taxonomy visualization via Euler/X

2013 higher-level concepts

2001 higher-level concepts

2013/2001 species concepts

Euler/X uses ASP reasoning, RCC• Reads in 3 concept tables• Logic / consistency check• Inconsistency explanation• Provence, repair options• Max. inform. relations (mir)• Merge taxonomy visualization

• More in SfB – Formal Models

Euler project URL: https://sites.google.com/site/eulerdi/home

Page 26: Franz Et Al - Concepts and Tools Needed to Increase Bottom-Up Taxonomic Expert Participation in a Global Names-Based Infrastructure

Interim conclusions – concepts provide valuable TCH services

• The core semantics and prototype tools are in place to:

1. Handle both novel nomenclatural and taxonomic/phylogenetic data via

small (to large), incremental expert submissions to a suitable TCH.

Page 27: Franz Et Al - Concepts and Tools Needed to Increase Bottom-Up Taxonomic Expert Participation in a Global Names-Based Infrastructure

Interim conclusions – concepts provide valuable TCH services

• The core semantics and prototype tools are in place to:

1. Handle both novel nomenclatural and taxonomic/phylogenetic data via

small (to large), incremental expert submissions to a suitable TCH.

2. Concepts allow the new submissions of taxonomic effort and progress to:

1. Be identified as such (as are their individual authors).

2. By delimited from imprecise, or existing information.

3. Be semantically represented (parent/child hierarchies).

4. Be logically integrated with relevant previous concepts (Euler/X).

5. Be visualized in merge taxonomies that resolve provenance.

Page 28: Franz Et Al - Concepts and Tools Needed to Increase Bottom-Up Taxonomic Expert Participation in a Global Names-Based Infrastructure

Interim conclusions – concepts provide valuable TCH services

• The core semantics and prototype tools are in place to:

1. Handle both novel nomenclatural and taxonomic/phylogenetic data via

small (to large), incremental expert submissions to a suitable TCH.

2. Concepts allow the new submissions of taxonomic effort and progress to:

1. Be identified as such (as are their individual authors).

2. By delimited from imprecise, or existing information.

3. Be semantically represented (parent/child hierarchies).

4. Be logically integrated with relevant previous concepts (Euler/X).

5. Be visualized in merge taxonomies that resolve provenance.

• Jointly these services are needed to (1) counter disenfranchisement, (2) build

for the taxonomic process, and (3) facilitate better inferences in biology.

Page 29: Franz Et Al - Concepts and Tools Needed to Increase Bottom-Up Taxonomic Expert Participation in a Global Names-Based Infrastructure

How might this work in a TCH?

Page 30: Franz Et Al - Concepts and Tools Needed to Increase Bottom-Up Taxonomic Expert Participation in a Global Names-Based Infrastructure

Focus new development on the GN Interface for Taxonomic Editing

• Prototyped for GN1 (U.S.) by Mozzherin, Patterson & Shorthouse at MBL.

• In need of adding new functionality, interoperability, user community.

Upgrades to a native GN taxonomy editing layer are just one part of a grander TCH infrastructure

and service package.

Page 31: Franz Et Al - Concepts and Tools Needed to Increase Bottom-Up Taxonomic Expert Participation in a Global Names-Based Infrastructure

TCH infrastructure

Page 32: Franz Et Al - Concepts and Tools Needed to Increase Bottom-Up Taxonomic Expert Participation in a Global Names-Based Infrastructure

TCH infrastructure"Run" by experts,

individually, in groups.

Page 33: Franz Et Al - Concepts and Tools Needed to Increase Bottom-Up Taxonomic Expert Participation in a Global Names-Based Infrastructure

TCH infrastructureTaxonomists, phylogeneticists

work within "native" platforms.

Page 34: Franz Et Al - Concepts and Tools Needed to Increase Bottom-Up Taxonomic Expert Participation in a Global Names-Based Infrastructure

TCH infrastructureStrategy: initial establishment

with select expert communities.

Page 35: Franz Et Al - Concepts and Tools Needed to Increase Bottom-Up Taxonomic Expert Participation in a Global Names-Based Infrastructure

TCH infrastructureCapitalizing on existing, diversifiedGN1 infrastructure and services.

Page 36: Franz Et Al - Concepts and Tools Needed to Increase Bottom-Up Taxonomic Expert Participation in a Global Names-Based Infrastructure

TCH infrastructure

Expand GNITE into 3 powerful layers for single classification assembly, nomen-

clatural editing, and concept taxonomy

Page 37: Franz Et Al - Concepts and Tools Needed to Increase Bottom-Up Taxonomic Expert Participation in a Global Names-Based Infrastructure

TCH infrastructure

Build a FP "Lite" system to track all TCH submissions, edits, assignments of authorship, track expert credit profiles

Page 38: Franz Et Al - Concepts and Tools Needed to Increase Bottom-Up Taxonomic Expert Participation in a Global Names-Based Infrastructure

TCH infrastructure

GN "Union" = endorsed classification, although multiple

alternatives are an essential part of TCH output service; "intelligent alerts" notify

experts

Page 39: Franz Et Al - Concepts and Tools Needed to Increase Bottom-Up Taxonomic Expert Participation in a Global Names-Based Infrastructure

Conclusions – unless we build for the process, products will suffer

• "TCH embodies the view that improving existing classification repositories and

services is very much a matter of improving their ability to accommodate the

systematic research and publication process."

Page 40: Franz Et Al - Concepts and Tools Needed to Increase Bottom-Up Taxonomic Expert Participation in a Global Names-Based Infrastructure

Conclusions – unless we build for the process, products will suffer

• "TCH embodies the view that improving existing classification repositories and

services is very much a matter of improving their ability to accommodate the

systematic research and publication process."

• "It is not just a matter of gathering more classifications into static structures

with limited options for expert access, editing, and classification provenance

tracking."

Page 41: Franz Et Al - Concepts and Tools Needed to Increase Bottom-Up Taxonomic Expert Participation in a Global Names-Based Infrastructure

Conclusions – unless we build for the process, products will suffer

• "TCH embodies the view that improving existing classification repositories and

services is very much a matter of improving their ability to accommodate the

systematic research and publication process."

• "It is not just a matter of gathering more classifications into static structures

with limited options for expert access, editing, and classification provenance

tracking."

• "Rather, it is about bottom-up collaboration that allows merger, critical

input, refinement, and due recognition of, and respect for, a diversity of views

that will lead to evolving authoritative taxonomic compilations."

Page 42: Franz Et Al - Concepts and Tools Needed to Increase Bottom-Up Taxonomic Expert Participation in a Global Names-Based Infrastructure

• TDWG 2013 Symposium organizers – Yde de Jong & Richard L. Pyle

• GN1 team – Dmitry Mozzherin, Richard Pyle, David Shorthouse, Robert Whitton

• Euler team, UC Davis – Bertram Ludäscher, Mingmin Chen, Shizhuo Yu, Shawn Bowers

• Juliana Cardona-Duque – Universidad de Antioquia, Medellín, Colombia

• Steven Baskauf (concept/occurrence graph) – Vanderbilt University

Acknowledgments

http://taxonbytes.orghttps://sols.asu.edu