Infrastructures for research and innovation Birney.pdf · Infrastructures for research and...

Preview:

Citation preview

Infrastructures for research and innovationInfrastructures for research and innovation

Professor Ewan Birney FRSDirector, EMBL-EBIwww.ebi.ac.uk

Outline of talk

• Who Am I, What is EMBL?• The change in genomics• The needs for stratified patients in clinical care and drug

discovery• Europe’s assets• A path to releasing Europe’s strengths

The European Molecular Biology Laboratory

Heidelberg, Germany

Main Laboratory

Barcelona, Spain

Tissue Biology, Disease Modeling

80+ nationalities80+ nationalities

Hinxton, Cambridge, UK

Bioinformatics

Neuroscience

Rome, Italy

>1600 personnel>1600 personnel

Grenoble, France

Hamburg, Germany

Structural Biology

6 sites in Europe6 sites in Europe

Structural Biology

Ewan Birney

• Lead the original team that analysed the human genome (gene sets)• Algorithm research in genomic information• Set up many key databases in genomics (eg, Ensembl)

• Director of EMBL-EBI• Non-executive director for Genomics England (NHS clinical

genomics)• Formal Advice to UK, Finnish, Danish, US governments; informal to

other governments• Advisor to both large (GSK) and small (Oxford Nanopore) companies• Chair of the Global Alliance for Genomics and Health (GA4GH)

We have been living through a revolution.

One genome 2003 to 2018

The cost of sequencing agenome in 2018

The cost of sequencing agenome in 2003

Imaging: new technologies change the gameEM tomography,Atomic-scale models from EM

Super-resolutionlight microscopy

High-resolution MRI and CTLight sheet microcopy

Genomics: from research to healthcare

Research

• English language• Light-weight legal• Similar systems• Open data• Publications• Grant funding

Practicing Medicine

• National language• Heavy legal framework• Different systems• Closed data• Not published• Contract funding

Big numbers!

Stratification of PatientsStratification of Patients

Stratification

Class A

Class B

Class C

Stratification

Benefits of stratification

• In clinical practice• Better diagnosis and prognosis

• Better use of (expensive) medicines (“personalised”medicine)

• Specific care pathways optimised for the cases

• In drug discovery• More clarity on the therapeutic goals in early development

• Cheaper and more likely to succeed Phase II and Phase IIItrials

4 Pillars of stratification

Very LargeVirtual Cohortsideally withpopulation scaleascertainment

At scalegenomicassays

Harmonisedrepresentation ofkey aspects ofEHRs

Clear legal basis toaccess appropriatedata and approachpatients

Europe’s AssetsEurope’s Assets

Well regulated, often state run healthcare

• Total population size of >200 million• The largest coherent EHR records in the world (Denmark,

6 million Danish citizens)• Sweden, Norway, Finland all have good record keeping• Large, predominantly state run systems in France and UK

• Historical as well as future health data

The most advanced clinical + populationgenomics programs globally

• Finland - >10% of the population sequenced in 5 years• Estonia – aiming for all 1 million biobanked• Denmark – 5Million EHRs, 100,000 sequenced• UK – Goal of 5 million with genomic assays within 5 years• France – Clinical + Population scale assays for ~1 million

within 5 years• Spain – Variety of regional programs with scale to

millions

An European Framework: MEGA

Genomic Infrastructure

• EMBL-EBI• World leader in genome

information and analysis

• The most comprehensivelifescience datasetsglobally

• ELIXIR• European wide network

with National nodes toconnect local researchand healthcare

ELIXIRNode Map

Associated Institutes

ELIXIR-BEKatholiekeUniversiteit Leuven

ELIXIR-BEUniversity ofAntwerp

ELIXIR-BEUniversity of Liège

ELIXIR-BEVrije UniversiteitBrussel

ELIXIR-BEUniversiteit Hasselt

ELIXIR-BE InteruniversityInstitute of BioinformaticsBrussels

ELIXIR-CZ: MasarykUniversity (CEITEC)

ELIXIR-CZ: MasarykUniversity (CERIT-SC)

ELIXIR-CZ: Institute ofChemical Technology

ELIXIR-CZ: Institute ofExperimental Botany AS CR, v.v. i.

ELIXIR-CZ: Institute ofMolecular Genetics of the ASCR

ELIXIR- CZ Institute ofMicrobiology ASCR

ELIXIR-CZ: Cesnet

ELIXIR-CZ: University of SouthBohemia

The need for infrastructureClinicalRecord +Diagnosis

NationalGenomeDatabase

ReferenceInfrastructure

A vibrant commercial research sector

• Many European large scale pharmaceutical companies• Sanofi, GSK, Roche, AstraZeneca, Novartis• Balance of US vs European research intensity

• Vibrant SME community• Based around clusters – Heidelberg-Stuttgart-Munich-Basel,

Paris-Brussels-Amsterdam, Oxford-London-Cambridge,Barcelona, Stockholm-Helsinki

• Public-private partnerships• IMI• OpenTargets @EMBL-EBI

A path for European stratifiedpopulationsA path for European stratifiedpopulations

Alignment of European programs

• Million genomes declaration• EMBL-EBI and ELIXIR (ESFRI) as genomic infrastructure• IMI programs as an instrument to foster cross-

institutional, trans-national, public-private partnerships

Engagement with Nation state Healthstrategy• Practical “on the ground” implementation is in the hands

of the operations and regulation of the healthcaresystems in Europe• Source of EHR information

• Source of genomic information

• Fundamental need to have >100 million person cohortswill drive trans-national work• Clear for smaller countries that between country federation

is needed

• Clear for rare disease in all countries; will become relevantto more diseases

Engagement with global structures

• Europe has to tackle trans-national coordination farearlier than the US or Chinese systems• Similar opportunity as mobile phone GSM standards – the

need for ultimately trans-national access places Europe asthe leader in how to solve this• Legal and ethical components (GDPR)

• Technical components

• Leadership in global bodies, such as GA4GH (GlobalAlliance for Genomics and Health)

EMBL-EBIFollow me on twitter @ewanbirney

Thank you!Thank you!

1/11/2019 25

Our mission

Deliverexcellentresearch

Train thenext

generation of

scientists

Engagewith

Europeanindustry

Coordinatebioinformatics in Europe

Deliverscientificservices

Life science: many data typesGenes, genomes & variation

Gene, protein & metabolite expression

Protein sequences, families & motifs

Macromolecular structures

Interactions, reactions & pathways

Chemogenomics & metabolomics

Phenotypes

Data resources at EMBL-EBI

Literature &ontologies• Experimental Factor

Ontology• Gene Ontology• BioStudies• Europe PMC

Chemicalbiology• ChEBI• ChEMBL• SureChEMBL

Molecular structures• Protein Data Bank in

Europe• Electron Microscopy Data

Bank

Gene, protein & metaboliteexpression• Expression Atlas• Metabolights• PRIDE• RNA Central

Proteinsequences,families & motifs• InterPro• Pfam• UniProt

Genes, genomes & variation• Ensembl• Ensembl Genomes• GWAS Catalog• Metagenomics portal

Systems• BioModels• BioSamples• Enzyme Portal• IntAct• Reactome

Molecular Archives• European Nucleotide Archive• European Variation Archive• European Genome-phenome Archive• ArrayExpress

~410 peopleWorldwide collaborations

See the live map at www.ebi.ac.uk/about/our-impact

Global reference data

Big data, big demand

~27 millionrequests to EMBL-EBI websites every

day

Sustainable FundingOver 40 difference funding agencies worldwide

Forward commitment of over £100 million

EMBL-EBI delivered

1-5 US$ billionin efficiency savings worldwide

Scientists at over

3.2 millionunique IP addresses use

EMBL-EBI websites

NickGoldman

OliverStegle

JohnMarioni

JanetThornton

ZaminIqbal

EvangeliaPetsalaki

VirginieUhlmann

DanielZerbino

PaulFlicek

MoritzGerstung

RobFinn

AlvisBrazma

PedroBeltrao

AlexBateman

EwanBirney

AndrewLeach

Research groups at EMBL-EBI

Research data at EMBL-EBI

Proteomic & RNA comparisonEvolution ofphosphorylation sites

Mutations affecting proteinsimplicated in rare diseases

< Modelingunwanted

variation insingle-cell

transcriptome studies

Genomics ofinfectious disease

>

Single Cell Genomics

Translational bioinformatics

EMBL Research Community

• Research group picture

~170 people~50 visitors / year

Medical GenomicsMedical Genomics

Serious efforts on way• Genomics England

• 100,000 Genomes by end of 2019 (35,000 done now)• Long term 60K-100K from “routine healthcare” across NHS

• Plan France Génomique• ~100,000 genomes / year by 2025, first sites selected

• Iceland• 40% of the population genotyped/sequenced + imputed

• Switzerland• SPRT program to promote genomic medicine

• Finland• at least ~10% (0.5 million) of the population with sequence data by 2020

• US – Complex payer/insurance lead market• Mixture of HMO (Geisgner) and NIH (All of Us – mainly a cohort)

Genomics: from research to healthcare

Research

• English language• Light-weight legal• Similar systems• Open data• Publications• Grant funding

Practicing Medicine

• National language• Heavy legal framework• Different systems• Closed data• Not published• Contract funding

Bridges need at least two anchors

Long-term goals

• Ideal: “Institute for Biomedical informatics” in eachcountry

• Large nations/populations: Distributed network with aclear centre of gravity

• EMBL-EBI & ELIXIR handle research data: referencecollections and sharing amongst researchers (includingclinical)

• Institute for Biomedical Informatics:• Responsible for exploiting molecular reference data• Provides the national link and point of reference (eg, around

legislation)• Broker for research data (back to EMBL-EBI, NCBI &

ELIXIR)

France EMBL-EBIFrance EMBL-EBI

Basic Research• Working collaboratively with Elixir-France

• Orphanet, CAZy

• Support training in bioinformatics

• Ensuring French scientists and institutes exploit EMBL-EBI• Seamless APIs to allow submission of data driven by

institutes (less complexity for user/scientist, use EMBL-EBIas backup)• GDR Mediatec for Chemical Ecology -> Metabolights

• Genscope DNA data -> ENA

• Research work with French research scientists• Institute Pasteur, Institute Curies links

• French Embassy internships at EMBL-EBI

Applied Research : Medicine

• Ensuring transfer of skills and expertise to the Frenchmedical system• France’s medical genomics must be run and delivered in

France (obviously!)

• Technical aspects, eg, Archiving DNA data at scalenationally

• Reference human biology resource• Orphanet

• Infectious epidemiology/bacterial genome sequencing?

• Working with Elixir-France and others for internationalstandards• ELIXIR’s role in GA4GH standards

Big numbers!

Global standards: the GA4GH

• GA4GH is THE standards-settingbody for genomics and healthcare• Embraces federated approach

• Setting community standards early

• Cloud: Analysis carried out where the data ‘lives’• “You’re already using it!”: SAM/BAM/CRAM/VCF formats• Tools: htsget – the first step away from file-based access• Rare disease diagnoses: Matchmaker Exchange• Federated discovery: GA4GH Beacons

Federation

Open research data Healthcare datawith research use

analysis analysis

Aggregate data globally

Download, analyse locally

Analyse data locally (via VMs)

Collate analyses

Recommended