26
V irtual Biodiversity V iBRANT -infrastructure SEVENTH FRAMEWORK PROGRAMME A decadal view of biodiversity informatics: challenges and priorities Alex Hardisty, Dave Roberts, and the biodiversity informatics community* * 80 people took part in the open debate that led to this paper

Hardisty roberts 190313_opt

Embed Size (px)

DESCRIPTION

Description of White Paper published in BMC Ecology

Citation preview

Page 1: Hardisty roberts 190313_opt

Virtual BiodiversityViBRANT

-infrastructureSEVENTH FRAMEWORK PROGRAMME

A decadal view of biodiversity informatics: challenges and priorities Alex Hardisty, Dave Roberts, and the

biodiversity informatics community*

* 80 people took part in the open debate that led to this paper

Page 2: Hardisty roberts 190313_opt

Virtual BiodiversityViBRANT

-infrastructureSEVENTH FRAMEWORK PROGRAMME

A decadal view of biodiversity informatics: challenges and priorities Alex Hardisty, Dave Roberts, and the

biodiversity informatics community*

* 80 people took part in the open debate that led to this paper

“We are drowning in information, while starving for wisdom. The world henceforth will be run by synthesizers, people able to put together the right information at the right time, think critically about it, and make important choices” E. O. Wilson, Harvard

Page 3: Hardisty roberts 190313_opt

Virtual BiodiversityViBRANT

-infrastructureSEVENTH FRAMEWORK PROGRAMME

A decadal view of biodiversity informatics: challenges and priorities Alex Hardisty, Dave Roberts, and the

biodiversity informatics community*

* 80 people took part in the open debate that led to this paper

“We are drowning in information, while starving for wisdom. The world henceforth will be run by synthesizers, people able to put together the right information at the right time, think critically about it, and make important choices” E. O. Wilson, Harvard

Time to model all

life on Earth.

Purves et. al. (2013) Nature, 493: 295-297

Page 4: Hardisty roberts 190313_opt

Virtual BiodiversityViBRANT

-infrastructureSEVENTH FRAMEWORK PROGRAMME

An infrastructure to allow the available data to be brought into a coordinated coupled modelling environment, capable of addressing questions relating to our use of the natural environment, that captures the variety, distinctiveness and complexity of all life on Earth

A decadal view of biodiversity informatics: challenges and priorities Alex Hardisty, Dave Roberts, and the

biodiversity informatics community*

* 80 people took part in the open debate that led to this paper

The Grand Challenge for Biodiversity Informatics

To achieve it we need:To build user confidenceIntegrative flexible e-Science environmentsPredictive models across multiple scales, coupled

Page 5: Hardisty roberts 190313_opt

Virtual BiodiversityViBRANT

-infrastructureSEVENTH FRAMEWORK PROGRAMME

1. Open Data should be normal practice;

Page 6: Hardisty roberts 190313_opt

Virtual BiodiversityViBRANT

-infrastructureSEVENTH FRAMEWORK PROGRAMME

1. Open Data should be normal practice;2. Data encoding should

allow analysis across multiple scales;

Page 7: Hardisty roberts 190313_opt

Virtual BiodiversityViBRANT

-infrastructureSEVENTH FRAMEWORK PROGRAMME

1. Open Data should be normal practice;2. Data encoding should

allow analysis across multiple scales;

3. Infrastructure projects should devote significant resources to market the service they develop;

Page 8: Hardisty roberts 190313_opt

Virtual BiodiversityViBRANT

-infrastructureSEVENTH FRAMEWORK PROGRAMME

Courtesy of ‘Linking Global Names and Pro-iBiosphere’, 2013, D. J. Patterson

AAAAAGCTCGTAGTTGGATTTGTGATGGAATTTGAATACTTTTAAAGTGTTCTAGAAACTGTCATCCGTGGGTGGAATTTGTTTGGCATTAGGTTGTCAGRCAGAGGATGCCTATMCTTTACTGTGAAAAAATCAGTGCGTTCAAAGCAGACTTACGTCGATGAATGTATTAGCATGGAA

Didimosphenia geminata

TGTATTTATTTA AGTTAGTT

didymo

Gomphonema vulgare Bréb.

Echinella geminata

Didymosphenia geminata (Lyngbye) Schmidt 1899

D. geminata

Didymosphenia geminata

Didymosphenia geminata (Lyngbye)

didymo

Rock Snot

Page 9: Hardisty roberts 190313_opt

Virtual BiodiversityViBRANT

-infrastructureSEVENTH FRAMEWORK PROGRAMME

Actinobacillus actimomycetemcomitansActinobacillus actimycetemcomitansActinobacillus actinmycetemcomitansActinobacillus actinomicetemcomitansActinobacillus actinomyActinobacillus actinomyceActinobacillus actinomycemcomitansActinobacillus actinomyceremcomitansActinobacillus actinomycetamActinobacillus actinomycetamcomitansActinobacillus actinomycetecomitansActinobacillus actinomycetemcmitansActinobacillus actinomycetemcomintansActinobacillus actinomycetemcomitanceActinobacillus actinomycetemcomitansActinobacillus actinomycetemcomitants

Actinobacillus actinomycetemcommitansActinobacillus actinomycetemocimitansActinobacillus actinomycetencomitansActinobacillus actinomycetumActinobacillus actinomyctemcomitansActinobacillus actinomyectomcomitansActinobacillus actinomyetemcomitansActinobacillus actinonmycetemcomitansActinobacillus actionomycetemcomitansActinobacillus actynomicetemcomitansActinobacillus antinomycetemcomitans

Difficulties with Latinized NamesTranscription errors

Nomenclator provides correct spelling. Indexing infrastructure resolves to it.

Courtesy of Dave Remsen, GBIF

Names as strings of characters…4. A list of taxon names

Page 10: Hardisty roberts 190313_opt

Virtual BiodiversityViBRANT

-infrastructureSEVENTH FRAMEWORK PROGRAMME

Actinobacillus actimomycetemcomitansActinobacillus actimycetemcomitansActinobacillus actinmycetemcomitansActinobacillus actinomicetemcomitansActinobacillus actinomyActinobacillus actinomyceActinobacillus actinomycemcomitansActinobacillus actinomyceremcomitansActinobacillus actinomycetamActinobacillus actinomycetamcomitansActinobacillus actinomycetecomitansActinobacillus actinomycetemcmitansActinobacillus actinomycetemcomintansActinobacillus actinomycetemcomitanceActinobacillus actinomycetemcomitansActinobacillus actinomycetemcomitants

Actinobacillus actinomycetemcommitansActinobacillus actinomycetemocimitansActinobacillus actinomycetencomitansActinobacillus actinomycetumActinobacillus actinomyctemcomitansActinobacillus actinomyectomcomitansActinobacillus actinomyetemcomitansActinobacillus actinonmycetemcomitansActinobacillus actionomycetemcomitansActinobacillus actynomicetemcomitansActinobacillus antinomycetemcomitans

Difficulties with Latinized NamesTranscription errors

Nomenclator provides correct spelling. Indexing infrastructure resolves to it.

Names as strings of characters… 4. A list of taxon names

DOI: 10.4289/0013-8797.115.1.75

5. Persistent Identifiers

Page 11: Hardisty roberts 190313_opt

Virtual BiodiversityViBRANT

-infrastructureSEVENTH FRAMEWORK PROGRAMME

Actinobacillus actimomycetemcomitansActinobacillus actimycetemcomitansActinobacillus actinmycetemcomitansActinobacillus actinomicetemcomitansActinobacillus actinomyActinobacillus actinomyceActinobacillus actinomycemcomitansActinobacillus actinomyceremcomitansActinobacillus actinomycetamActinobacillus actinomycetamcomitansActinobacillus actinomycetecomitansActinobacillus actinomycetemcmitansActinobacillus actinomycetemcomintansActinobacillus actinomycetemcomitanceActinobacillus actinomycetemcomitansActinobacillus actinomycetemcomitants

Actinobacillus actinomycetemcommitansActinobacillus actinomycetemocimitansActinobacillus actinomycetencomitansActinobacillus actinomycetumActinobacillus actinomyctemcomitansActinobacillus actinomyectomcomitansActinobacillus actinomyetemcomitansActinobacillus actinonmycetemcomitansActinobacillus actionomycetemcomitansActinobacillus actynomicetemcomitansActinobacillus antinomycetemcomitans

Difficulties with Latinized NamesTranscription errors

Nomenclator provides correct spelling. Indexing infrastructure resolves to it.

Names as strings of characters… 4. A list of taxon names

DOI: 10.4289/0013-8797.115.1.75

5. Persistent Identifiers

6. Author identifiers

Page 12: Hardisty roberts 190313_opt

Virtual BiodiversityViBRANT

-infrastructureSEVENTH FRAMEWORK PROGRAMME

7. 3rd party authentication

Page 13: Hardisty roberts 190313_opt

Virtual BiodiversityViBRANT

-infrastructureSEVENTH FRAMEWORK PROGRAMME

8. Classification Bank

Embley & Stackebrandt (1994) Bergey’s Manual, 2nd Edition (2012)

Actinomycetes: the antibiotic factories

Atopobium minutumSphaerobacter thermophilus

strain TH3

Bifidobacteriaceae

Actinomycetaceae

Arthrobacteriaceae, Cellomonadaceae, Microbacteriaceae, Dermatophilaceae and realtives

Propionibacteriaceae

Nocardioidaceae

Frankiaceae

Corynebacteriaceae, Mycobacteriaceae, Nocardiaceae and realtives

Actinoplanaceae

Pseudonocardiaceae

Streptomycetaceae, Streptosporangiaceae and relatives

Insertion element in 23S rRNA

Page 14: Hardisty roberts 190313_opt

Virtual BiodiversityViBRANT

-infrastructureSEVENTH FRAMEWORK PROGRAMME

9. Accepted names

AnnualChecklist

DynamicEditionHome

© 2013, Species 2000 at University of Reading | Disclaimer

Latest on Twitter

Overview

About the Catalogue of LifeContributors & partnersContact us

User Guide

Getting startedVersions of the CatalogueContributing your dataGlossary

Additional Services

DownloadsAdvanced services

'The most comprehensive andauthoritative global index ofspecies currently available, theCatalogue of Life consists of asingle integrated checklist andtaxonomic hierarchy for all theworld's species.'

Welcome to the Catalogue of Life website: gateway to ourdatabase of the world's known species of animals, plants, fungiand micro-organisms

Explore »

This Dynamic Edition is a constantly evolving version of the Catalogue ofLife.

Now tracking 70% of species known to science

1,315,754 species

Annual Checklist »The Annual Checklist is a snapshot of the entireCatalogue of Life: a fixed imprint.

Why two versions?

Design: Chris Turnbull | Content: Simon Thornton-Wood

Catalogue of Lifecatalogueoflife

Join the conversation

catalogueoflife Catalogue of Life,11th March 2013 is now online atcatalogueoflife.org/col6 days ago · reply · retweet · favorite

catalogueoflife Catalogue of Life,08th February 2013 is now online atcatalogueoflife.org/col37 days ago · reply · retweet · favorite

I . P . N . I

Page 15: Hardisty roberts 190313_opt

Virtual BiodiversityViBRANT

-infrastructureSEVENTH FRAMEWORK PROGRAMME

10. Tools to make LOD

Implicit semantics“Compound 2a melted at 119oC”

Humans are good at interpreting this:

Explicit semantics CML Schema<cml:molecule ref=“2a”> <cml:property> <cml:scalar dictRef=“prop:mpt” units=“units:celsius” dataType=“xds:float” >119</cml:scalar> </cml:property></cml:molecule>

Molecules in CML/InChl

propertyDictionaryunitsDictionary

W3CSchema

4 namespaces, 3 dictionaries

Machines need this:

Page 16: Hardisty roberts 190313_opt

Virtual BiodiversityViBRANT

-infrastructureSEVENTH FRAMEWORK PROGRAMME

Most textbooks will tell you that, in 1610, Galileo Galilei became the first person to observe Saturn's rings.

But what did he really see?

GBIF/GBIC – 2-4 Jul 2012 – Copenhagen, © 2012, R. J. Robbins

Page 17: Hardisty roberts 190313_opt

Virtual BiodiversityViBRANT

-infrastructureSEVENTH FRAMEWORK PROGRAMME

GBIF/GBIC – 2-4 Jul 2012 – Copenhagen, © 2012, R. J. Robbins

This?

Page 18: Hardisty roberts 190313_opt

Virtual BiodiversityViBRANT

-infrastructureSEVENTH FRAMEWORK PROGRAMME

GBIF/GBIC – 2-4 Jul 2012 – Copenhagen, © 2012, R. J. Robbins

Or this?

Page 19: Hardisty roberts 190313_opt

Virtual BiodiversityViBRANT

-infrastructureSEVENTH FRAMEWORK PROGRAMME

GBIF/GBIC – 2-4 Jul 2012 – Copenhagen, © 2012, R. J. Robbins

The generation of important new insights while handicapped with limited technology, indirect measurement, and fuzzy data is the mark of scientific greatness.

Page 20: Hardisty roberts 190313_opt

Virtual BiodiversityViBRANT

-infrastructureSEVENTH FRAMEWORK PROGRAMME

11. Data fit for purposeData are received at face-value, examined and tested. If the user is satisfied, then the data will be applied.

Page 21: Hardisty roberts 190313_opt

Virtual BiodiversityViBRANT

-infrastructureSEVENTH FRAMEWORK PROGRAMME

12. Observational data infrastructure

Agriculture Systems

Climate Change

Forest Management

Invasion Biology

Urban Ecosystems

http://www.teamnetwork.org

http://www.earthobservations.org/geobon.shtml

http://www.eubon.eu

http://www.neoninc.org

http://mooreabiocode.org/

Moorea Biocode Project

http://www.ilternet.edu

Page 22: Hardisty roberts 190313_opt

Virtual BiodiversityViBRANT

-infrastructureSEVENTH FRAMEWORK PROGRAMME

To build user confidence

Thus far, all projects share a common problem of keeping services running after project funding ended

New models are needed

To create translational pipelines to industry adoption

To encourage institutional adoption for care and maintenance

For recognition of contribution other than through publication of academic papers

Stronger marketing and outreach

Invest more in up-skilling and hand-holding

Page 23: Hardisty roberts 190313_opt

Virtual BiodiversityViBRANT

-infrastructureSEVENTH FRAMEWORK PROGRAMME

Integrative flexible e-Science environmentsUsing standardised building blocks and workflows

Interoperable components

With access to data from multiple sources

Recognise different kinds of VRE

General-purpose / specialised / single scientific objective

- cf. chemistry laboratory vs forensics lab vs HIV vaccine lab

- BioVeL / AquaMaps and iMarine / CarbonWaterCloud

Must generate immediate benefit for users

Science driven, with scientists as active participants in creation of infrastructure

Functions people find useful: simple and intuitive

Technology invisible (disappears into background)

Page 24: Hardisty roberts 190313_opt

Virtual BiodiversityViBRANT

-infrastructureSEVENTH FRAMEWORK PROGRAMME

Predictive models across multiple scales

A new framework of methods, techniques, standards to bring about interoperability of data and models across different biological scales

From Genetic through species and ecosystem to landscape

Learn from Virtual Physiological Human and from Numerical weather prediction and climatology Edwards (2010). A Vast Machine

“General Ecological Models” Purves et al. (2013). doi:10.1038/493295a

Evolvable to incorporate new scientific insights

Re-analysis models

Making data we have global

Implies ‘inversion’ of existing infrastructure

‘inversion’ of existing infrastructure is about re-examining every element of data we have to re-construct the past biodiversity, as a guide and calibrator of models that can predict the future

Page 25: Hardisty roberts 190313_opt

Virtual BiodiversityViBRANT

-infrastructureSEVENTH FRAMEWORK PROGRAMME

Section 1: The fundamental backbone (getting the basics right)

Section 2: The next steps

Section 3: New tools

Section 4: The human interface

1. Why are names important? 2. How are names organised? 3. Which is the right name? 4. What is the name of that organism? 5. Can biodiversity studies be done without names? 6. Biodiversity data beyond names 7. To link resources we need identifiers 8. Centralised or networked services?

15. Data Sharing 16. Why do we need vocabularies and ontologies? 17. How would Knowledge Organising Systems help? 18. How easy is it to integrate data?

27. How do you aggregate the data you need? 28. How complete are the data? 29. How can we encourage virtual research

environments? 30. What can you do with your data in the future?

9. How to balance professional and non-professional contributions

10. Engagement of users 11. Who's who? 12. User identification 13. How do we ensure the right metadata are created

at the point of data generation? 14. Sustaining the physical infrastructure

19. Beyond Sharing and Re-use: the problem of scale 20. How reliable are the data? 21. What will the physical infrastructure look like?

22. How much of the legacy collections can be digitised? 23. How to generate more targeted and reliable data? 24. What role do mobile devices play? 25. How do you find the data you need? 26. How do you extract the data you need?

31. How can we give users confidence? 32. Who owns what? 33. What benefits come to contributors?

Page 26: Hardisty roberts 190313_opt

Virtual BiodiversityViBRANT

-infrastructureSEVENTH FRAMEWORK PROGRAMME

Thank you for your attention.

Any questionsAlex Hardisty <[email protected]>Dave Roberts <[email protected]>

http://www.biovel.eu http://vbrant.eu

Hardisty et al. (2013) A decadal view of biodiversity informatics: challenges and priorities. BMC Ecology (in press)

http://www.slideshare.net/vibrantmanager/hardisty-roberts-190313opt