17
Dimitris Koureas, PhD Natural History Museum London Linking layers of biodiversity data: Informatics challenges for the long tail research RDA - Long Tail IG breakout session Amsterdam, 23 Sep 2014

Dimitris Koureas, PhD Natural History Museum London Linking layers of biodiversity data: Informatics challenges for the long tail research RDA - Long Tail

Embed Size (px)

Citation preview

Page 1: Dimitris Koureas, PhD Natural History Museum London Linking layers of biodiversity data: Informatics challenges for the long tail research RDA - Long Tail

Dimitris Koureas, PhDNatural History Museum London

Linking layers of biodiversity data:Informatics challenges for the long tail research

RDA - Long Tail IG breakout sessionAmsterdam, 23 Sep 2014

Page 2: Dimitris Koureas, PhD Natural History Museum London Linking layers of biodiversity data: Informatics challenges for the long tail research RDA - Long Tail

The problem – Capturing and integrating biodiversity data

How to we join up these activities? How do we use this as a tool? Species conservation & protected areas

Impacts of human developmentBiodiversity & human health

Impacts of climate changeFood, farming & biofuels

Invasive alien species

What infrastructures do we need?(technologies, tools, standards…)What processes do we need?(Modelling, workflows…)What data do we need?(Genes, localities…)

Page 3: Dimitris Koureas, PhD Natural History Museum London Linking layers of biodiversity data: Informatics challenges for the long tail research RDA - Long Tail

LinkD

Challenge 1: mobilising data at all scales

Page 4: Dimitris Koureas, PhD Natural History Museum London Linking layers of biodiversity data: Informatics challenges for the long tail research RDA - Long Tail

LinkD

Challenge 2: linking & aggregating data at different scales

National Efforts c.5M(e.g. NHM Data Portal)

Communities c.50k(e.g. Scratchpads)

Global Efforts c.500M(e.g. GBIF Data Portal)

Page 5: Dimitris Koureas, PhD Natural History Museum London Linking layers of biodiversity data: Informatics challenges for the long tail research RDA - Long Tail

LinkD

Challenge 3: Synthesising data, e.g. modelling human pressures on biodiversity

www.predicts.org.uk

Projecting Responses of Ecological Diversity In Changing Terrestrial Systems

2M records, 19k sites, 34k spp.

Management Practices

Ecosystems Agro-systems

Small aggregated datasets

Species richness in different ecosystems

• Land-use change• Pollution• Invasive species• Infrastructure

Models to predict how biodiversity responds to human pressures

Page 6: Dimitris Koureas, PhD Natural History Museum London Linking layers of biodiversity data: Informatics challenges for the long tail research RDA - Long Tail

The problem – integrating biodiversity research

Figure from Costello M.J et al, 2013. doi: 10.1126/science.1230318

Page 7: Dimitris Koureas, PhD Natural History Museum London Linking layers of biodiversity data: Informatics challenges for the long tail research RDA - Long Tail

c. 17000 new sp and subsp. described every year

The problem – integrating biodiversity research

Page 8: Dimitris Koureas, PhD Natural History Museum London Linking layers of biodiversity data: Informatics challenges for the long tail research RDA - Long Tail

Key problems• Landscape is complex, fragmented & hard to navigate• Many audiences (policy makers, scientists, amateurs, citizen scientists)• Many scales (global solutions to local problems)

Figure adapted from Peterson et al 2010

An informaticians view of biodiversity

Page 9: Dimitris Koureas, PhD Natural History Museum London Linking layers of biodiversity data: Informatics challenges for the long tail research RDA - Long Tail

An informaticians view of biodiversity

Investigator-focused 'small data‘

Locally generated 'invisible data'

'incidental data'

dark data

20%

80%

Published and discoverable data

Dark data more important mainly due to their volume1

1Heidorn PB. Library Trends 57:280-299

Page 10: Dimitris Koureas, PhD Natural History Museum London Linking layers of biodiversity data: Informatics challenges for the long tail research RDA - Long Tail

Incentives for mobilising long-tail research

Leverage effort and data impact

Increase exposure and citability of work

Provide easy to use and long-lasting VRE

Promote the culture of openness in science

Page 11: Dimitris Koureas, PhD Natural History Museum London Linking layers of biodiversity data: Informatics challenges for the long tail research RDA - Long Tail

Increase exposure and citability of work

Scholarly data publication

Enable easy publication of data and data descriptors

Link data journals with data sources (repositories, VREs) using common data exchange standards

Small data contributions

Page 12: Dimitris Koureas, PhD Natural History Museum London Linking layers of biodiversity data: Informatics challenges for the long tail research RDA - Long Tail

Leverage effort and data impact

Virtual Research Environments

Empower researchers through development and deployment of service-driven digital research environments

515 Scratchpad Communities

by 6,321 active registered users

covering 176,950 taxa

in 932,296 pages. 134 paper citations in 2013

In total more than

2,500,000 visitors

Page 13: Dimitris Koureas, PhD Natural History Museum London Linking layers of biodiversity data: Informatics challenges for the long tail research RDA - Long Tail

Leverage effort and data impact

Long tail data External data & services

Page 14: Dimitris Koureas, PhD Natural History Museum London Linking layers of biodiversity data: Informatics challenges for the long tail research RDA - Long Tail

Leverage effort and data impact

Enable long tail researchers to do science online by processing own data together with data from cross-disciplinary sources

Provide workflows for the processing of data in major areas of biodiversity research: ecological niche modelling, ecosystem functioning, and taxonomy.

The BioVeL approach

Design and Construct – Run – Share and Discover scientific workflows

Page 15: Dimitris Koureas, PhD Natural History Museum London Linking layers of biodiversity data: Informatics challenges for the long tail research RDA - Long Tail

Leverage effort and data impact

A highly dynamic but fragmented landscape

Page 16: Dimitris Koureas, PhD Natural History Museum London Linking layers of biodiversity data: Informatics challenges for the long tail research RDA - Long Tail

Data curation

Data publishing

Data mobilisation &

generation

Data analysis

Leverage effort and data impact

Seamless virtual research environments that incentivise mobilisation of long tail research

Page 17: Dimitris Koureas, PhD Natural History Museum London Linking layers of biodiversity data: Informatics challenges for the long tail research RDA - Long Tail

H2020 2015 VRE Proposal: LinkD

Topic:EINFRA-9-2015Virtual Research Environments

Estimated Budget:€ 8-9 m

Consortium:c. 24 partners

LinkDLinking data, services and communities for predictive modelling of the biosphere

Deliver a coherent and accessible ecosystem of federated services and deploy a network of research and collaboration enabling tools to support scientific excellence towards the long term vision of predicting modelling of the biosphere

Builds upon:ViBRANT | BioVeL | pro-iBiosphere | EU-BON

Strategic links to:ESFRI projects (incl. LifeWatch, ELIXIR)