54
Overview of EMBL-European Bioinformatics Institute and Interactions with CDISC Dominic Clark Industry Programme Manager [email protected] www.ebi.ac.uk/industry

Overview of EMBL-European Bioinformatics Institute and ... · Overview of EMBL-European Bioinformatics Institute and Interactions with CDISC ... community-agreed checklists and control

Embed Size (px)

Citation preview

Page 1: Overview of EMBL-European Bioinformatics Institute and ... · Overview of EMBL-European Bioinformatics Institute and Interactions with CDISC ... community-agreed checklists and control

Overview of EMBL-European Bioinformatics Institute

and Interactions with CDISC

Dominic Clark

Industry Programme Manager

[email protected]

www.ebi.ac.uk/industry

Page 2: Overview of EMBL-European Bioinformatics Institute and ... · Overview of EMBL-European Bioinformatics Institute and Interactions with CDISC ... community-agreed checklists and control

Key topics

• EMBL-EBI Background, Services and Standards activities

• EMBL-EBI working with Industry

• The Genomic Standards Consortium

• Challenges ahead

Page 3: Overview of EMBL-European Bioinformatics Institute and ... · Overview of EMBL-European Bioinformatics Institute and Interactions with CDISC ... community-agreed checklists and control
Page 4: Overview of EMBL-European Bioinformatics Institute and ... · Overview of EMBL-European Bioinformatics Institute and Interactions with CDISC ... community-agreed checklists and control

OUR

MISSION

To contribute to

the advancement

of biology

through basic

investigator-

driven research

in bioinformatics

Page 5: Overview of EMBL-European Bioinformatics Institute and ... · Overview of EMBL-European Bioinformatics Institute and Interactions with CDISC ... community-agreed checklists and control
Page 6: Overview of EMBL-European Bioinformatics Institute and ... · Overview of EMBL-European Bioinformatics Institute and Interactions with CDISC ... community-agreed checklists and control
Page 7: Overview of EMBL-European Bioinformatics Institute and ... · Overview of EMBL-European Bioinformatics Institute and Interactions with CDISC ... community-agreed checklists and control
Page 8: Overview of EMBL-European Bioinformatics Institute and ... · Overview of EMBL-European Bioinformatics Institute and Interactions with CDISC ... community-agreed checklists and control

What is EMBL-EBI?

• Part of the European

Molecular Biology

Laboratory

• International, non-profit

scientific institute

• Europe’s hub for biological

data services

Page 9: Overview of EMBL-European Bioinformatics Institute and ... · Overview of EMBL-European Bioinformatics Institute and Interactions with CDISC ... community-agreed checklists and control

Where is EMBL-EBI?

© John Freebury

• We share a campus with

the Wellcome Trust

Sanger Institute

• Near Cambridge, UK

EMBL-EBI

Hinxton data centre

(Most services run

from data centres in

London)

14/11/2013 9

Page 10: Overview of EMBL-European Bioinformatics Institute and ... · Overview of EMBL-European Bioinformatics Institute and Interactions with CDISC ... community-agreed checklists and control

EMBL member states

Austria, Belgium, Croatia,

Denmark, Finland, France,

Germany, Greece, Iceland, Ireland,

Israel, Italy, Luxembourg, the

Netherlands, Norway, Portugal,

Spain, Sweden, Switzerland and

the United Kingdom

Associate member state: Australia

Page 11: Overview of EMBL-European Bioinformatics Institute and ... · Overview of EMBL-European Bioinformatics Institute and Interactions with CDISC ... community-agreed checklists and control

Our funders

EMBL member states: Austria, Belgium, Croatia, Denmark,

Finland, France, Germany, Greece, Iceland, Ireland, Israel,

Italy, Luxembourg, the Netherlands, Norway, Portugal, Spain,

Sweden, Switzerland, United Kingdom.

Associate member state: Australia

Other major funders: the European Commission,

UK Research Councils, the US National Institutes of Health

and the Wellcome Trust

Page 12: Overview of EMBL-European Bioinformatics Institute and ... · Overview of EMBL-European Bioinformatics Institute and Interactions with CDISC ... community-agreed checklists and control

EMBL-EBI users: a snapshot

Page 13: Overview of EMBL-European Bioinformatics Institute and ... · Overview of EMBL-European Bioinformatics Institute and Interactions with CDISC ... community-agreed checklists and control

The new EBI building & ELIXIR Technical

hub.

14/11/2013 1

3

Page 14: Overview of EMBL-European Bioinformatics Institute and ... · Overview of EMBL-European Bioinformatics Institute and Interactions with CDISC ... community-agreed checklists and control

Who we are

~500 members of staff

~53 nationalities

~400 in services & support

~100 focus on basic research

Page 15: Overview of EMBL-European Bioinformatics Institute and ... · Overview of EMBL-European Bioinformatics Institute and Interactions with CDISC ... community-agreed checklists and control

EMBL-EBI works collaboratively

Hinxton Cambridge UK

Global Europe

Page 16: Overview of EMBL-European Bioinformatics Institute and ... · Overview of EMBL-European Bioinformatics Institute and Interactions with CDISC ... community-agreed checklists and control

EMBL-EBI research collaborations

We share funding and author

publications with partner

institutes throughout the world:

• 327 publications in 2011

(90% in collaboration with

other institutes)

• 843 grants shared with other

institutes in 2011

Page 17: Overview of EMBL-European Bioinformatics Institute and ... · Overview of EMBL-European Bioinformatics Institute and Interactions with CDISC ... community-agreed checklists and control

Data and tools for molecular life science

Services

www.ebi.ac.uk/services

Page 18: Overview of EMBL-European Bioinformatics Institute and ... · Overview of EMBL-European Bioinformatics Institute and Interactions with CDISC ... community-agreed checklists and control

Atlas

what happens where

From molecules to medicine

Biology is changing:

• Data explosion

• New types of data

• Emphasis on systems

• Growth of applied biology

• molecular medicine

• agriculture

• food

• environmental

sciences.

Page 19: Overview of EMBL-European Bioinformatics Institute and ... · Overview of EMBL-European Bioinformatics Institute and Interactions with CDISC ... community-agreed checklists and control

Big and bigger data

Page 20: Overview of EMBL-European Bioinformatics Institute and ... · Overview of EMBL-European Bioinformatics Institute and Interactions with CDISC ... community-agreed checklists and control

Key principles about our services

• Freely available

• A comprehensive collection of molecular databases

• Globally coordinated data collection and dissemination

• Produced in collaboration with other world leaders, e.g.:

• NCBI (United States)

• Wellcome Trust Sanger Institute (United Kingdom)

• National Institute of Genetics (Japan)

• SIB Swiss Institute of Bioinformatics (Switzerland)

Page 21: Overview of EMBL-European Bioinformatics Institute and ... · Overview of EMBL-European Bioinformatics Institute and Interactions with CDISC ... community-agreed checklists and control

Data resources at EMBL-EBI

Genes, genomes

& variation

RNA, protein &

metabolite

expression

Protein sequences,

families & motifs

Molecular & cellular

structures

Reactions, interactions &

pathways

Chemical biology

Ontologies & biological

samples

Scientific literature

Page 22: Overview of EMBL-European Bioinformatics Institute and ... · Overview of EMBL-European Bioinformatics Institute and Interactions with CDISC ... community-agreed checklists and control

Data resources at EMBL-EBI

Genomes & variation

• Ensembl

• Ensembl Genomes

• Genome-phenome archive

• Metagenomics

Nucleotide sequences

• European Nucleotide

Archive (ENA)

Expression

• ArrayExpress

• Expression Atlas

• PRIDE

• R-Workbench Proteins

• The Universal Protein

Resource (UniProt)

• InterPro Chemical biology

• ChEMBL

• ChEBI

Literature & ontology

• Europe PubMed

Central

• Gene Ontology

Molecular structures

• Protein Data Bank in Europe

• PDBsum

• ProFunc

Pathways

• IntAct

• Reactome

• MetaboLights

Systems

• BioModels

• Enzyme Portal

• BioSamples

Patent sequences

• Non-redundant patent

sequence dbs

• Patent compounds

Page 23: Overview of EMBL-European Bioinformatics Institute and ... · Overview of EMBL-European Bioinformatics Institute and Interactions with CDISC ... community-agreed checklists and control

Standards development – international collaborations Genomes

www.geneontology.org

gensc.org

Functional Genomics

www.fged.org

Protein sequence

www.uniprot.org

Proteomics

www.psidev.info/

Protein structure

www.wwpdb.org

Cheminformatics

www.ebi.ac.uk/chebi

Pathways

www.reactome.org

www.biopax.org

Systems modeling

www.sbml.org

www.sbgn.org

Metabolomics

www.metabolomicssociety.org

Literature and text mining

www.pistoiaalliance.org/

Nucleotide sequence

www.insdc.org

www.barcodeoflife.org/

Page 24: Overview of EMBL-European Bioinformatics Institute and ... · Overview of EMBL-European Bioinformatics Institute and Interactions with CDISC ... community-agreed checklists and control

Database collaborations: we collaborate on standards and data sharing

in global data sharing agreements for all our major databases.

14/11/2013 24

Page 25: Overview of EMBL-European Bioinformatics Institute and ... · Overview of EMBL-European Bioinformatics Institute and Interactions with CDISC ... community-agreed checklists and control

2005: The Genomics Standards Consortium

• A vast and rich body of information has grown up as a

result of the world’s enthusiasm for ’omics technologies.

Finding ways to describe and make available this

information that maximise its usefulness has become a

major effort across the ’omics world. At the heart of this

effort is the Genomic Standards Consortium (GSC), an

open-membership organization that drives community-

based standardization activities,

• The GSC call for the scientific community to join forces to

improve the quality and quantity of contextual information

about our public collections of genomes, metagenomes,

and marker gene sequences.

Page 26: Overview of EMBL-European Bioinformatics Institute and ... · Overview of EMBL-European Bioinformatics Institute and Interactions with CDISC ... community-agreed checklists and control

The GSC’s Mission

• the implementation of new genomic standards

• methods of capturing and exchanging metadata

• harmonization of metadata collection and analysis efforts across the wider genomics community

Page 27: Overview of EMBL-European Bioinformatics Institute and ... · Overview of EMBL-European Bioinformatics Institute and Interactions with CDISC ... community-agreed checklists and control

Community-driven solutions

The path:

• Identify the problem

• Define a community to address it

• Define scope of the solution

• Implement solution

• Gain adoption of solution

Page 28: Overview of EMBL-European Bioinformatics Institute and ... · Overview of EMBL-European Bioinformatics Institute and Interactions with CDISC ... community-agreed checklists and control

Data standardization at ENA

Petra ten Hoopen

European Nucleotide Archive

Page 29: Overview of EMBL-European Bioinformatics Institute and ... · Overview of EMBL-European Bioinformatics Institute and Interactions with CDISC ... community-agreed checklists and control

European Nucleotide Archive

http://www.ebi.ac.uk/ena/home

Permanent and comprehensive repository for public

domain nucleotide sequences and associated information

• Archiving

• Helpdesk

• Training

• Standards development

• Technology development

• Community building

Page 30: Overview of EMBL-European Bioinformatics Institute and ... · Overview of EMBL-European Bioinformatics Institute and Interactions with CDISC ... community-agreed checklists and control

ENA data model

Data = raw reads and nucleotide sequence assemblies

Metadata = information associated with sequences, includes

provenance of biological sample (sample), sequencing experiment (experiment) and its

scope (study), analysis and annotation of sequences (analysis), and files of raw data (run)

Study

Experiment

Analysis

Sample

Run

Data

Page 31: Overview of EMBL-European Bioinformatics Institute and ... · Overview of EMBL-European Bioinformatics Institute and Interactions with CDISC ... community-agreed checklists and control

ENA data standardization

Standardized reporting requirements for all metadata and

data objects

Study

Experiment

Analysis

Sample

Run

Data

agreed by

INSDC

Consortia of scientific domain-specific experts

(e.g. GSC, MicroB3, RNACentral)

implemented with

community-agreed checklists and control vocabularies,

data-type-specific file formats

Page 32: Overview of EMBL-European Bioinformatics Institute and ... · Overview of EMBL-European Bioinformatics Institute and Interactions with CDISC ... community-agreed checklists and control

ENA checklists 30 Checklists for assembled and annotated sequences in

WEBIN submission system

Large scale

• WGS unannotated

• WGS annotated

• EST

• GSS

• STS

• TSA unannotated

• TSA annotated

Community Standards

• Barcode COI

• MIMARKS 16S

• MIMARKS soil sample 16S

RNA

• Single CDS mRNA

• Single viral CDS genomic RNA

• ssRNA viral polyprotein

• ssRNA viral cRNA

DNA

• Single CDS genomic DNA

• MHC gene 1-exon

• MHC gene 2-exons

• Gene intron

• ITS region

• ETS region

• IGS

• Phylogenetic marker

• COI gene

• D-loop

• trnK-matK locus

• Satellite DNA

• Betasatellite

• rRNA gene

• 16-23S ISR

• Gene promoter

Page 33: Overview of EMBL-European Bioinformatics Institute and ... · Overview of EMBL-European Bioinformatics Institute and Interactions with CDISC ... community-agreed checklists and control

Power of ENA checklists

consistent reporting

user-friendly data submission

data validation

data retrieval

data discovery

data interoperability

data usability

ENA-implemented checklists support and improve:

Page 34: Overview of EMBL-European Bioinformatics Institute and ... · Overview of EMBL-European Bioinformatics Institute and Interactions with CDISC ... community-agreed checklists and control

1. help to achieve objectives of data standardization efforts

1. assist to both data submitters and data users

ENA-implemented checklists

Page 35: Overview of EMBL-European Bioinformatics Institute and ... · Overview of EMBL-European Bioinformatics Institute and Interactions with CDISC ... community-agreed checklists and control

The EMBL-EBI Industry Programme

Page 36: Overview of EMBL-European Bioinformatics Institute and ... · Overview of EMBL-European Bioinformatics Institute and Interactions with CDISC ... community-agreed checklists and control

We support larger companies through the “Industry

Programme”

• For the past 17 years the Industry Programme has been

an integral part of EMBL-EBI, providing on-going and

regular contact with key stakeholder groups.

• Established in 1996, the programme is now well

established as a subscription-funded service for larger

companies.

• Through the Industry Programme, EMBL-EBI provides

specialist workshops, standards-based activities and

pre-competitive research and development opportunities

of particularly relevance to the industry programme

members.

14/11/2013 36

Page 37: Overview of EMBL-European Bioinformatics Institute and ... · Overview of EMBL-European Bioinformatics Institute and Interactions with CDISC ... community-agreed checklists and control

www.ebi.ac.uk/industry

Page 38: Overview of EMBL-European Bioinformatics Institute and ... · Overview of EMBL-European Bioinformatics Institute and Interactions with CDISC ... community-agreed checklists and control

Industry Programme members

Page 39: Overview of EMBL-European Bioinformatics Institute and ... · Overview of EMBL-European Bioinformatics Institute and Interactions with CDISC ... community-agreed checklists and control

The EMBL-EBI Industry Programme

• Relationship between industry members, EMBL-EBI and

our collaborators.

• Enabling industrial update of innovations in bioinformatics

• Knowledge Exchange workshops with world

leaders/KOLs

• Neutral ground for members to explore strategic

developments and concepts

• Input into services development

• Pre-competitive collaboration

• Standards development

• Technical development

Page 40: Overview of EMBL-European Bioinformatics Institute and ... · Overview of EMBL-European Bioinformatics Institute and Interactions with CDISC ... community-agreed checklists and control

Early Success: development of MIAMI

Standard • MIAME describes the Minimum Information About a

Microarray Experiment that is needed to enable the

interpretation of the results of the experiment

unambiguously and potentially to reproduce the

experiment. [Brazma et al., Nature Genetics]

• The public repositories ArrayExpress at the EBI (UK),

GEO at NCBI (US) and CIBEX at DDBJ (Japan) are

designed to accept, hold and distribute MIAME compliant

microarray data.

Page 41: Overview of EMBL-European Bioinformatics Institute and ... · Overview of EMBL-European Bioinformatics Institute and Interactions with CDISC ... community-agreed checklists and control

The six most critical elements contributing

towards MIAME are:

• The raw data for each hybridisation (e.g., CEL or GPR files)

• The final processed (normalised) data for the set of hybridisations in the

experiment (study) (e.g., the gene expression data matrix used to draw the

conclusions from the study)

• The essential sample annotation including experimental factors and their

values (e.g., compound and dose in a dose response experiment)

• The experimental design including sample data relationships (e.g., which raw

data file relates to which sample, which hybridisations are technical, which

are biological replicates)

• Sufficient annotation of the array (e.g., gene identifiers, genomic coordinates,

probe oligonucleotide sequences or reference commercial array catalog

number)

• The essential laboratory and data processing protocols (e.g., what

normalisation method has been used to obtain the final processed data)

Page 42: Overview of EMBL-European Bioinformatics Institute and ... · Overview of EMBL-European Bioinformatics Institute and Interactions with CDISC ... community-agreed checklists and control

MIBBI - Minimum Information for Biological

and Biomedical Investigations

• The MIBBI project promotes extant efforts developing

minimum information guidelines for the reporting of

biological and biomedical science to the wider

community. Background and history of the MIBBI project

can be found here. We work to progressively move the

information to this new site that is also set to provide

additional search and link functionality to connect

guidelines with terminologies and exchange format, as

used by the community.

• There are 38 MIBBI records in BioSharing –

• http://www.biosharing.org/standards/mibbi

Page 43: Overview of EMBL-European Bioinformatics Institute and ... · Overview of EMBL-European Bioinformatics Institute and Interactions with CDISC ... community-agreed checklists and control

Knowledge Exchange Workshops

• The Industry Programme organises high quality workshops and

symposia, providing expert level presentations and strategic

discussion opportunities for members and other key opinion

leaders.

• Workshops:

• Prioritised by the IP members based on proposals

• Organised through a planning team,

• Include key opinion leaders as speakers

• Include appropriate stakeholders

• By individual/collective invitation only.

• Facilitated

• Take a significant amount of planning

14/11/2013 43

Page 44: Overview of EMBL-European Bioinformatics Institute and ... · Overview of EMBL-European Bioinformatics Institute and Interactions with CDISC ... community-agreed checklists and control

Member-driven workshops

Computational

systems

biology

Data

integration

Page 45: Overview of EMBL-European Bioinformatics Institute and ... · Overview of EMBL-European Bioinformatics Institute and Interactions with CDISC ... community-agreed checklists and control

Workshops in 2012

Workshop Title Date

Using electronic health records (EHRs) for

translational bioinformatics

Feb 2012

Chemogenomics Mar 2012

1000 Genomes Project Apr 2012

R & Bioconductor training workshop May 2012

Metabolomics May 2012

Antibody Informatics June 2012

Systems Biology for Toxicology Pathways Sept 2012

Secure Hosted Services Oct 2012

1000 Genomes and NSG data Analysis (Novartis

site, Cambridge, MA)

Nov 2012

Pre-clinical Safety Data (EMBL, Heidelberg, DE) Nov 2012

14/11/2013 45

Page 46: Overview of EMBL-European Bioinformatics Institute and ... · Overview of EMBL-European Bioinformatics Institute and Interactions with CDISC ... community-agreed checklists and control

Industry Programme Workshops for 2013

14.11.2013 4

6

Workshop Title Date

Oncogenomics 13-14 Mar 2013

Overview of Biomedical Ontologies 17-18 Apr 2013

Biomarkers 23-24 Apr 2013

Encode and Epigenomics 19-20 Jun 2013

Data Integration and its application 18-19 Sep 2013

Translational informatics 23-24 Oct 2013

Oncogenomics (Pfizer, Pearl River, NY) 14-15 Nov 2013

Computational tools for chemical biology,

phenotypic screening & target de-convolution

21-22 Nov 2013

RNA-seq data analysis 11-12 Dec 2013

Page 47: Overview of EMBL-European Bioinformatics Institute and ... · Overview of EMBL-European Bioinformatics Institute and Interactions with CDISC ... community-agreed checklists and control

Dates for 2014

14.11.2013 4

7

Workshop Title Date

Rare Diseases and drug repositioning 24-25 Mar 2014

Encode Workshop, Cambridge, MA, USA 15-16 Apr 2014

EBI/EuroDISH/NuGO workshop on Nutrition

Information, Ontologies and Nutrigenomics

29-30 Apr 2014

Systems Pharmacology 7th-8th May 2014

Biologics 21-22 May 2014

Shared Data, Shared Cost 18-19 June 2014

Page 48: Overview of EMBL-European Bioinformatics Institute and ... · Overview of EMBL-European Bioinformatics Institute and Interactions with CDISC ... community-agreed checklists and control

What happens after workshops?

• Presentations are made available in Industry members

website

• Short report

• Where appropriate EMBL-EBI will act as a coordinator or

broker in establishing pre-competitive

collaborations/initiatives between Industry programme

members (and third parties – academic groups, funding

organisations, other commercial companies)

• Publication: Examples from 2011

• MIABE paper in NRDD

• Tox ontology roadmap papers

14/11/2013 48

Page 49: Overview of EMBL-European Bioinformatics Institute and ... · Overview of EMBL-European Bioinformatics Institute and Interactions with CDISC ... community-agreed checklists and control

Published Sept. 2011, PMID: 21878981

14/11/2013 49

Page 50: Overview of EMBL-European Bioinformatics Institute and ... · Overview of EMBL-European Bioinformatics Institute and Interactions with CDISC ... community-agreed checklists and control

Major challenges remain

Page 51: Overview of EMBL-European Bioinformatics Institute and ... · Overview of EMBL-European Bioinformatics Institute and Interactions with CDISC ... community-agreed checklists and control

Variation: EGA and GWAS

• Explore datasets from Genome-

Wide Association Studies

(GWAS)

• All types of sequence and

genotype experiments:

• Case control

• Population

• Family studies

• SNP and CNV genotypes from

array-based methods

• Genotyping done with re-

sequencing methods

European Genome-

phenome Archive:

www.ebi.ac.uk/ega

Page 52: Overview of EMBL-European Bioinformatics Institute and ... · Overview of EMBL-European Bioinformatics Institute and Interactions with CDISC ... community-agreed checklists and control

A Global Alliance for sharing genomic

and clinical data • EMBL and EMBL-EBI have joined the Global Alliance, a large-scale,

international effort to enable the secure sharing of genomic and clinical data.

The Global Alliance invites commercial and not- for-profit organisations to

join forces with other leading data, health care, research, and disease

advocacy organisations to establish an evidence base for genomic research

and medicine that adheres to the highest standards of ethics and privacy.

• A White Paper circulated in early 2013 has the support of nearly 70

organisations in Asia, Australia, Africa, Europe, North America and South

America who are committed to creating a common framework that supports

data analysis and protects the autonomy and privacy of participating

individuals. Signatories of an accompanying Letter of Intent to create a not-

for-profit, inclusive, public–private, international, non-governmental

organisation include healthcare providers, research institutions, disease

advocacy groups, life science and information technology companies. Many

more are expected to join.

Page 53: Overview of EMBL-European Bioinformatics Institute and ... · Overview of EMBL-European Bioinformatics Institute and Interactions with CDISC ... community-agreed checklists and control

Summary

• EMBL-EBI is one of the global leaders in the storage,

annotation, interrogation and dissemination of large datasets

of relevance to the bio-industries.

• Standards are an important part of international data

exchange and effective utilisation of information.

• We work closely with industry in developing new standards.

• Major challenges remain.

Page 54: Overview of EMBL-European Bioinformatics Institute and ... · Overview of EMBL-European Bioinformatics Institute and Interactions with CDISC ... community-agreed checklists and control

Acknowledgements

• Peter Sterk (U. Oxford) and secretary of GSC.

• Petra ten Hoopen (EMBL-EBI)