Using Standards in Real Life - BioMedBridges...2014/11/03  · BioMedBridges/ELIXIR Resource...

Preview:

Citation preview

Using Standards in Real Life

Helen Parkinson (EMBL-EBI) & Morris Swertz (UMCG)

BioMedBridges AGM Florence, 10-12 March 2014

On behalf of WP3 partners and collaborators

What, how and who? o  What? Addition of scientific value between the ESFRI-

BMS Research Infrastructure domains o  Who? For users, and developers of the infrastructures,

represented by use case pilots within the project and more widely

o  How? Catalog, review, modification, registration, development and implementation of identifier, content, format and semantic standards supporting: ¡  Data Exchange ¡  Data Integration ¡  Infrastructure development, delivering new tools and

supporting data analysis

Connectivity

Themes: Identifiers ¡ Standardize identifier usage and drive technical

implementation with identifier resources

Themes: Identifiers ¡ Standardize identifier usage and drive technical

implementation with identifier resources

Themes: Identifiers ¡ Standardize identifier usage and drive technical

implementation with identifier resources

IC50, mode of action, target, adverse events, clinical trials, etc

Themes:Standards ¡ Standardize identifier usage ¡ Support the use of standards and promote

interoperability of standards via a registry and the mappings between them

Themes:Standards ¡ Standardize identifier usage ¡ Support the use of standards and promote

interoperability of standards via a registry and the mappings between them

Access to tools, semantic interoperability

¡ Standardize identifier usage ¡ Support the use of standards, promote

interoperability ¡ Provide access to tools ¡ Semantic interoperability (ontologies)

IS-A TYPE 1 DIABETES MELLITUS

MODY SYNDROME

RARE INSULIN DEPENDENT DIABETES MELLITUS

IS-A

METABOLIC DISEASE

Access to tools, semantic interoperability

¡ Standardize identifier usage ¡ Support the use of standards, promote

interoperability ¡ Provide access to tools ¡ Semantic interoperability (ontologies)

IS-A TYPE 1 DIABETES MELLITUS

MODY SYNDROME

RARE INSULIN DEPENDENT DIABETES MELLITUS

IS-A

METABOLIC DISEASE

Define:standards?

Define:standards?

Respondents classified into three categories across all domains Majority serving data All Using data Some both serving and using data

Define:standards?

NO

YES

KNOWLEDGE

DATA

KNOWLEDGE

IDENTIFIERS

DATA

KNOWLEDGE

IDENTIFIERS

FORMATS

DATA

*1033

KNOWLEDGE

IDENTIFIERS

FORMATS

ONTOLOGIES

DATA

*1033

DATA

KNOWLEDGE

IDENTIFIERS

FORMATS

ONTOLOGIES

DATA

KNOWLEDGE

IDENTIFIERS

FORMATS

ONTOLOGIES

DATA

KNOWLEDGE

IDENTIFIERS

FORMATS

ONTOLOGIES

DATA

KNOWLEDGE

IDENTIFIERS

FORMATS

ONTOLOGIES

What is this identifier for? How/where do I convert it? How do I convert my format to get this tool to work I want to merge two datasets and co-analyse them, what tools can I use? What’s the best analysis tool for my problem ? I need a web tool, I can’t install stuff on my desktop

How do I ….?

How do I ….?

Gene Ontology Enrichment Analysis - to establish if some subset of e.g. genes from a microarray analysis are enriched in terms of some biological function coded using the gene ontology e.g immune response

How do I ….?

Gene Ontology Enrichment Analysis - to establish if some subset of e.g. genes from a microarray analysis are enriched in terms of some biological function coded using the gene ontology e.g immune response

How do I ….?

Gene Ontology Enrichment Analysis - to establish if some subset of e.g. genes from a microarray analysis are enriched in terms of some biological function coded using the gene ontology e.g immune response

BioMedBridges/ELIXIR Resource Registry

¡  Provides a simple search interface ¡  Content: 1943 tools etc., 22,232 annotation

¡  E.g. URL, text, ontology term: type, formats ..

¡  Classifies tools using an ontology ¡  E.g. Sequence analysis tool

¡  Download complete content ¡  Supports a wide scope of tools ¡  Provides an interface to the literature ¡  Simple spreadsheet population ¡  Domain neutral

http://bioregistry.cbs.dtu.dk/

… and the user?

....when a tool updated its GO data is quite important…. …..I used GOrilla in the end as my requirements were pretty simple; a straight enrichment study and the graphics and interface were clean and easy to understand…

… and the user?

....when a tool updated its GO data is quite important…. …..I used GOrilla in the end as my requirements were pretty simple; a straight enrichment study and the graphics and interface were clean and easy to understand…

Clever naming doesn’t help users search for things!

Registry Future ¡ Building on 7 user engagement workshops >125

requests for new interface features ¡ Sustainability via ELIXIR Danish node collaboration ¡ Federated content sharing between registries and

projects esp. cross domain e.g. with EuroBioImaging, BioCatalog etc

¡ Benchmarking e.g. comparison of GO tools ¡ Projects adopting the code for local use ¡ Addressing interoperability -> automation and an

interoperable toolkit

http://bioregistry.cbs.dtu.dk/

Data pooling

Discover and integrate representative populations data sets (cohorts) to validate/recalibrate disease prediction models

DATA

KNOWLEDGE

DISCOVER TOOLS & DATA

HARMONISE DATA

DERIVE/ACQUIRE DATA

ANALYSE

Discover and integrate populations data sets (cohorts) to validate/recalibrate disease prediction models

Aim1. Find a cohort that we can test the model on, and that is representative of our local population -> results will be relevant to clinical practice Aim 2. Find a cohort that has common meta data with the model we wish to test, or has the meta data that can be converted

Harmonise Data

Harmonise Data

Harmonisation Process ¡ Identify the data elements from published study

to apply to our cohort

Harmonisation Process ¡ Identify the data elements from published study

to apply to our cohort ¡ Match the data element “Parental diabetes” –

annotate the elements with ontologies, query expansion, with string matching to assist the user

Harmonisation Process ¡ Identify the data elements from published study

to apply to our cohort ¡ Match the data element “Parental diabetes” –

annotate the elements with ontologies, query expansion, with string matching to assist the user

Harmonisation Process ¡ Identify the data elements from published study

to apply to our cohort ¡ Match the data element “Parental diabetes” –

annotate the elements with ontologies, query expansion, with string matching to assist the user

Harmonisation Process ¡ Identify the data elements from published study

to apply to our cohort ¡ Match the data element - stemming, ontologies, ¡ E.g. Parental diabetes vs. Diabetes mother/

father ¡ Convert the values e.g. for Units

Harmonisation Process ¡ Identify the data elements from published study

to apply to our cohort ¡ Match the data element - stemming, ontologies, ¡ E.g. Parental diabetes vs. Diabetes mother/

father ¡ Convert the values e.g. for Units

Derive/Acquire Data

Perform Reproducible Analysis

Data Pooling Conclusions

o  Three models assessed for prediction in Netherlands population o  Models applicable in both populations after calibration o  Easier to do using tools than by hand o  New tools to support this and other pooling scenarios – other domains

o  Standards registry o  Samples integration across infrastructures in BioSamples o  Harmonization tools

o  BioBankConnect – new libraries for sample conversion o  Zooma - data/ontology mapping tools o  Access to algorithms via tools registry

60 POSTERS & DEMOs USER ENGAGEMENT

Acknowledgements ¡ All BMB project partners, personnel and their BMS

Infrastructure colleagues ¡ Registry

¡  Kristoffer Rapacki, Emil Rydza, Piotr Chmura (ELIXIR DK Node ) ¡  Chris Mungall (Gene Ontology) and Anita Bandrowski

(NeuroInformatics Framework) ¡  BioCatalogue, Carole Goble, Niall, Beard,Aleksandra Nenadic

(ELIXIR UK Node) ¡  eTRIKs, TrAIT, Biosharing.org, TransMart, IMPC, RD-

Connect, DIACHRON, BioShare (Chao Pang) ¡  The BioMedBridges project is funded by the European

Commission within Research Infrastructures of the FP7 Capacities Specific Programme, grant agreement number 284209

¡ EMBL Core Funds, Parkinson, Birney Teams

Recommended