68
A World-Wide Network for Metabolomics Data Exchange Christoph Steinbeck European Bioinformatics Institute (EMBL-EBI)

World-wide data exchange in metabolomics, Wageningen, October 2016

Embed Size (px)

Citation preview

A World-Wide Network for Metabolomics Data ExchangeChristoph Steinbeck

European Bioinformatics Institute(EMBL-EBI)

The European Bioinformatics Institute

(EBI)

The European Molecular Biology Laboratory

(EMBL)

A basic research institute funded by public research monies from 20 member states.

European Bioinformatics Institute (EBI)Genes, genomes & variation

Literature & ontologies Europe PubMed Central Gene Ontology Experimental Factor Ontology Molecular structures

Protein Data Bank in Europe Electron Microscopy Data Bank

European Nucleotide Archive 1000 Genomes

Gene, protein & metabolite expression

Protein sequences, families & motifs

Chemical biology

Reactions, interactions & pathways Systems

Ensembl Ensembl Genomes

European Genome-phenome Archive Metagenomics portal

Understanding Phenotypes

Nutrition

Exercise

Disease

AgeDrugs

Environment

Phenome/Exposome

Reaction times following external change

• Genetics (decades, centuries…)

• Epigenetics (days, month, years,…)

• Gene Expression (hours)

• Metabolism (seconds)

The Metabolome is the most accessible and

dynamically changing Molecular Phenotype

Phenome Centres popping up all over the world

• London

• Birmingham

• Shanghai

• NIH RCMRCs

• …

> 100,000 patient samples / year> Several PetaBytes/year

=> ExaBytes of human data at moderate scale-up

How do you make sense of all that data?

Share them-

Free and Open

What do the EBI databases do? Labs around the world send us their data and

we…

Archive it

Classify itShare it with other data providers

Analyse it

…provide tools to help researchers

use it

A collaborative enterprise

MetaboLights

http://www.ebi.ac.uk/metabolights

open-access, cross-species, cross-application,long-term supported

Salek, R.M., Haug, K. and Steinbeck, C. (2013) Dissemination of metabolomics results: role of MetaboLights and COSMOS. Gigascience, 2:8.

MetaboLights Database

Experimental Repository

Reference Layer

Chemistry Spectroscopy Biology

Ana

lysi

s To

ols

Primary Literature

Primary data and Meta-Data, Spectra, Protocols, Synopses, ...

www.ebi.ac.uk/metabolights (metabolights.org, metabolights.eu)

www.ebi.ac.uk/metabolights (metabolights.org, metabolights.eu)

Data growth in EBI data repositories

3-month doubling time

for Metabolomics

MetaboLights is now the recommended

repositoryfor the Nature journals,

EMBO journal, PLOS journals, Metabolomics

Journal and others

MetaboLights Stats May 2016

How do I submit data?

Sansone,… Steinbeck et al. (2012) Toward interoperable bioscience data.

Nature Genetics, 44, 121–126.

Controlled VocabulariesOntologies

Minimum Information Standards

Controlled VocabulariesOntologies

Ideally:LaboratoryInformationManagementSystem

ISA

MetaboLights Metabolomics Workbench

ENAPRIDE

Global Collaboration in Metabolomics and the

BioSciences

COSMOS COrdination of Standards in MetabolOmicS

European FP7 coordination action coordinated by us at

EMBL-EBI, Hinxton, Cambridge

• Create missing standards & formats

• Define workflows for dissemination

• Create world-wide data network

MetabolomeXchange 2014

• Global network for exchange and discoverability of metabolomics data

• Includes study as well as reference data

Multi-Omics Data Challenge

EBI BioSamples/BioStudies

EBI BioSamples/BioStudies

The MetaboLights Reference Layer

•8.7 mio eukaryotic species on earth (+- 1.3mio)•1.2 mio species identified and classified•3000 - 4000 complete species genomes sequenced

What about completed metabolomes?

Species Metabolomes and How Little We Know

1"

10"

100"

1000"

10000"

100000"

1000000"

10000000"

100000000"

Metabolites"in"Human"

Metabolites"in"Microbes"

Compounds"in"ChEBI"

Metabolites"in"HMDB"

Metabolites"in"Plants"

Compounds"in"ChEMBL"

Compounds"in"PubChem"

80,000200,000

2,000,000

There are known knowns; there are things we know we know.We also know there are known unknowns; that is to say, we know there are some things we do not know.But there are also unknown unknowns – the ones we don’t know we don’t know.

—United States Secretary of Defense,

Donald Rumsfeld

Building upon extensive genomics research, we argue that the time is now right to focus intensively on model organism metabolomes. We propose a grand challenge for metabolomics studies of model organisms: to identify and map all metabolites onto metabolic pathways, to develop quantitative metabolic models for model organisms, and to relate organism metabolic pathways within the context of evolutionary metabolomics, i.e., phylometabolomics. These efforts should focus on a series of established model organisms in microbial, animal and plant research.

Metabolites. 2016 Feb 15;6(1)

Species Metabolomes are being assembled on the fly

right now through data sharing in Metabolomics

Repository Entry

Repository Entry

Reference Layer

7 most annotated metabolomes in MetaboLights

30 most annotated metabolomes in MetaboLights

1600 metabolome sizes in MetaboLights on a log scale

Number of Studies in MetaboLights per Species

A Case for Deep Metabolome Annotation

Help building species metabolomes

•Submit your metabolomics study to MetaboLights•Submit data publications (e.g. to Scientific Data)•Be highly cited :)

What’s next?

•500 Million people in European Union•Full Genomes (soon for less than $1000 p. P.)•Urine/Blood Metabolome < 20 Euros per Patient

Phenome Centres popping up all over the world

• London

• Birmingham

• Shanghai

• NIH RCMRCs

• …

> 100,000 patient samples / year> Several PetaBytes/year

=> ExaBytes of human data at moderate scale-up

Large Scale Computing with Medical Metabolomics Data

• EBI lead• H2020• 3 Years• 13 Partners• 8 Mio €• 830 PM• Kick-off 9/15• H2020 e-infra

Large Scale Computing with Medical Metabolomics Data

Large Scale Computing with Medical Metabolomics Data

Large Scale Computing with Medical Metabolomics Data

Large Scale Computing with Medical Metabolomics Data

What I did not talk about

Computer-Assisted Structure Elucidation

(CASE)

Steinbeck C (2004) Recent developments in automated structure elucidation of natural products. Nat. Prod. Rep. 21, 512–518.

Finding the unknown

Limits to Growth•Deterministic methods suffer from combinatorial explosion

•Prospective use of spectroscopic input information may make them error-intolerant

No. of Heavy Atoms

No.

of C

onst

itutio

nal I

som

er

Cal

cula

tion

Tim

e

C13H16O3 (16 Heavy Atoms)

> 2,000,000,000 Constitutional Isomers

C10H16 (10 Heavy Atoms)

24938 Constitutional Isomers

C30H48O2 (32 Heavy Atoms)

>> 1012 Constitutional Isomers

Funding and CollaboratorsUK Research Councils (BBSRC, MRC) European Commission

Thanks for your attention