36
SysMO-DB: Towards “just enough” data exchange for the SysMO Consortium Katy Wolstencroft, University of Manchester, UK

SysMO-DB: Towards “just enough” data exchange for the SysMO Consortium Katy Wolstencroft, University of Manchester, UK

Embed Size (px)

Citation preview

Page 1: SysMO-DB: Towards “just enough” data exchange for the SysMO Consortium Katy Wolstencroft, University of Manchester, UK

SysMO-DB: Towards “just enough” data exchange for the SysMO Consortium

Katy Wolstencroft, University of Manchester, UK

Page 2: SysMO-DB: Towards “just enough” data exchange for the SysMO Consortium Katy Wolstencroft, University of Manchester, UK

Pan European collaboration Eleven individual projects, 91 institutes

Different research outcomes A cross-section of microorganisms, incl.

bacteria, archaea and yeast

Record and describe the dynamic molecular processes occurring in microorganisms in a comprehensive way

Present these processes in the form of computerized mathematical models

Pool research capacities and know-how

Already running since April 2007 Runs for 3-5 years

http://www.sysmo.net

Systems Biology of Microorganisms

Page 3: SysMO-DB: Towards “just enough” data exchange for the SysMO Consortium Katy Wolstencroft, University of Manchester, UK

The Problem

No one concept of experimentation or modelling

No planned, shared infrastructure for pooling

Page 4: SysMO-DB: Towards “just enough” data exchange for the SysMO Consortium Katy Wolstencroft, University of Manchester, UK

Started July 2008, 3 years, 3 staff + 3 investigators people, 3 teams over 3 sites

Sensitively retrofit a data access, model handling and data integration platform.

Support and manage the diversity of data, models and competencies.

Web-based solution:exchange of data, models and processes (intra-

and inter-consortia)search for data, models and processes across

the initiativedissemination of results

SysMO-DB

Page 5: SysMO-DB: Towards “just enough” data exchange for the SysMO Consortium Katy Wolstencroft, University of Manchester, UK

Own solutions

Suspicion

Data issues

Resource Issues

Own data solutions and collaboration environments. Wikis, e-Groupware, PHProjekt, BaseCamp, PLONE, Alfresco, bespoke commercial … files and spreadsheets.

Suspicion and caution over sharing.Interesting interplay between modellers, experimentalists and bioinformaticians

Many do not follow standards that exist or know who is doing what.

No extra resources for the consortiums91 institutes, 11 consortiums, some overlapping

Page 6: SysMO-DB: Towards “just enough” data exchange for the SysMO Consortium Katy Wolstencroft, University of Manchester, UK

Types of data

Multiple omics genomics, transcriptomics proteomics, metabolomics

Images Reaction Kinetics Models Relationships between data sets/experiments

Procedures, experiments, data, results and models Analysis of dataThe same across many Systems Biology projects

Page 7: SysMO-DB: Towards “just enough” data exchange for the SysMO Consortium Katy Wolstencroft, University of Manchester, UK

Principles…

A series of small victories Realistic Don‘t reinvent Sustainable and extensible Migrate to standards

Provide instant gratification Address doubt and anxiety Incremental development

Page 8: SysMO-DB: Towards “just enough” data exchange for the SysMO Consortium Katy Wolstencroft, University of Manchester, UK

The Lowest Hanging Fruit

A Catalogue of SysMO assets SysMO Yellow Pages The people and their expertise The institutions and their facilities Data – experimental data sets Data – analysed results Data – external reference data sets Models Processes – laboratory protocols and bioinformatics

analyses

The catalogue references assets held elsewhere

Page 9: SysMO-DB: Towards “just enough” data exchange for the SysMO Consortium Katy Wolstencroft, University of Manchester, UK

Dat

a

Mod

els

Proc

esse

s

SysMO DB

Technical Approach

SysMO-SEEK web interface

Assets and Yellow

Pages Catalogues

JERM

Page 10: SysMO-DB: Towards “just enough” data exchange for the SysMO Consortium Katy Wolstencroft, University of Manchester, UK

Social Approach PALS

21 Postdocs and PhD students Experimentalists, modellers and

bioinformaticians Our design and technical

collaboration team Very intense face to face and

virtual collaboration UK and Continental PALS

Chapters Audits and Sharing

Methods, data, models, standards, software, schemas, spreadsheets, SOPs…..

Page 11: SysMO-DB: Towards “just enough” data exchange for the SysMO Consortium Katy Wolstencroft, University of Manchester, UK

Communication via PALs

DB team PALS Projects

Show what is thereSuggest what is possible

Ask for requirements

Give requirementsTell priorities

Rate outcomesSuggest improvements

Double checkTransmit

Disseminate

Collect answers

Page 12: SysMO-DB: Towards “just enough” data exchange for the SysMO Consortium Katy Wolstencroft, University of Manchester, UK

Discovery SysMO-SEEK

Single, web based, access point Access control & Versioning managementYellow pages (“who is who”)

People, Expertise, Equipment Assets catalogue (“who has what”)

SOPs, Spreadsheets, pre-published models Metadata about Data held by projects

Access to other repositories Models (JWS Online), Workflows (myExperiment), Public web services (BioCatalogue)

Call out to external resources e.g. PubMed

Does not hold data and results

Holds metadata on results and links to results

A component for SysMO groups to incorporate in their own environments and applications

Page 13: SysMO-DB: Towards “just enough” data exchange for the SysMO Consortium Katy Wolstencroft, University of Manchester, UK
Page 14: SysMO-DB: Towards “just enough” data exchange for the SysMO Consortium Katy Wolstencroft, University of Manchester, UK
Page 15: SysMO-DB: Towards “just enough” data exchange for the SysMO Consortium Katy Wolstencroft, University of Manchester, UK

Sharing Policies Default private until

you say otherwise

Project defaults Private Share with the group Share with project Share with sysmo

Page 16: SysMO-DB: Towards “just enough” data exchange for the SysMO Consortium Katy Wolstencroft, University of Manchester, UK

“Just Enough” Exchangeof SysMO Assets

Page 17: SysMO-DB: Towards “just enough” data exchange for the SysMO Consortium Katy Wolstencroft, University of Manchester, UK

Experimental Processes

Protocol Title Authors Keywords Abstract Materials

ReagentsReagent Set UpEquipment

Time Taken Procedure Troubleshooting Critical Steps Anticipated Results References

Protocols and SOPs Nature Protocols format

recommendation You can upload Protocols in

any format, but if you use this one, we will index it and make searching easier

Encouraging standardisation

Page 18: SysMO-DB: Towards “just enough” data exchange for the SysMO Consortium Katy Wolstencroft, University of Manchester, UK

Workflow Management System

Bioinformatics Processes: Workflows Data preparation, annotation and analysis

pipelines SBML model construction and population Linking together Data sets, Web Services,

R scripts, BioMART, Java libraries, Grid Services, (MATLAB in beta)

Workflows as a mechanism for linking inside SEEK

Free and Open Source

Page 19: SysMO-DB: Towards “just enough” data exchange for the SysMO Consortium Katy Wolstencroft, University of Manchester, UK

Libraries of SysMO workflows

Page 20: SysMO-DB: Towards “just enough” data exchange for the SysMO Consortium Katy Wolstencroft, University of Manchester, UK

Models

SBML is the recommended format Not all models are SBML

JWS online allows storing and simulation of SBML models But - all models need to be shared JWS Online doesn’t have version and access control

Models can be shared in SEEK instead of directly in JWS online

Can still connect to JWS online and run simulations

Page 21: SysMO-DB: Towards “just enough” data exchange for the SysMO Consortium Katy Wolstencroft, University of Manchester, UK

Models

JWS online – a database of curated models and a model simulator Web service enabled to run from workflows

Used and accessed through SEEK…. Special instance of JWS Online for SysMO Store, validate and run models from SysMO-SEEK and publish later

Access to other models resources Biomodels, Copasi and Semantic SBML

Page 22: SysMO-DB: Towards “just enough” data exchange for the SysMO Consortium Katy Wolstencroft, University of Manchester, UK

Data Comparison and Exchange Public data sources

model organism databases – (e.g. SGD)

BRENDA …. Data produced by SysMO

SABIO-RK, iChiP, MeMo …. Local databases & Files

Excel Spreadsheets The most common form of

experimental data format.

Proteomics

Met

adat

a

Metabolomics

Microarray

Proteomics

Single Cell Data

Variable descriptions of dataLittle adoption of community controlled vocabulary terms

Page 23: SysMO-DB: Towards “just enough” data exchange for the SysMO Consortium Katy Wolstencroft, University of Manchester, UK

JERM

JERM “Just Enough Results Model” Minimum information to exchange data

What type of data is it Microarray, growth curve, enzyme activity…

What was measured Gene expression, OD, metabolite concentration….

What do the values in the datasets mean Units, time series, repeats….

Which experiment does it relate to How was the data created

SOPs and protocols

Harvesting standards, current practice and consortium schemas and spreadsheets Inspired by MCISB Key Results initiative and SBRML [Paton]

Page 24: SysMO-DB: Towards “just enough” data exchange for the SysMO Consortium Katy Wolstencroft, University of Manchester, UK

The Idea

For each data type….. Transcriptomics Proteomics Metabolomics Single Cell Data

Generate and apply…. JERM template JERM extractor for data host Subset registered in SEEK Access / export through JERM interface / template

Define a JERM….. Top down analysis of standards Bottom up analysis of practice

1

2

3

ISA-TAB

Page 25: SysMO-DB: Towards “just enough” data exchange for the SysMO Consortium Katy Wolstencroft, University of Manchester, UK

COSMIC

BaCell-SysMO

SysMOLab

MOSES

Alfresco

Alfresco

Wiki

Wiki

ANOTHER

A DATASTORE

JERM Adaptors

Page 26: SysMO-DB: Towards “just enough” data exchange for the SysMO Consortium Katy Wolstencroft, University of Manchester, UK

JERM Source Extractor Generator New spreadsheets adopt JERM

templates Legacy spreadsheet JERM

mapper Databases have JERM mapper

Spreadsheet Ontology Annotator Restrict the values that a range

of fields can have

Just Enough Results Model Tools

Met

adat

a SABIO-RK

BRENDA

myDB

mySpreadSheet

JERM Web Service Access Interface

Access Control

JERM Extractor and Access Wrapper Layer

JERMTemplate

SourceAccess

and Harvester

SourceExtractor

Page 27: SysMO-DB: Towards “just enough” data exchange for the SysMO Consortium Katy Wolstencroft, University of Manchester, UK

Experimental Data Metadata

People

ProjectsAssay

Study

Experimental conditionsFactors studied

Models

SOPs

Homogenised terminology and values in the datasets themselves

Workflows

Based on ISA-TAB

Investigation

SEEK + JERM

Page 28: SysMO-DB: Towards “just enough” data exchange for the SysMO Consortium Katy Wolstencroft, University of Manchester, UK

Incremental Annotation

Metadata can be added to assets at any time Extracted from JERM templates Added by the data owner through SEEK Added by another SysMO consortium member

with editing permission

Page 29: SysMO-DB: Towards “just enough” data exchange for the SysMO Consortium Katy Wolstencroft, University of Manchester, UK

In Practice for Spreadsheets

Native JERM Template JERMed

+

+ +

Page 30: SysMO-DB: Towards “just enough” data exchange for the SysMO Consortium Katy Wolstencroft, University of Manchester, UK

RegisterExtractMatched to the JERMAdding metadata

browse

search

++

Now

Whole record

Page 31: SysMO-DB: Towards “just enough” data exchange for the SysMO Consortium Katy Wolstencroft, University of Manchester, UK

RegisterExtractMatched to the JERMAdding metadata here

browse

search

+++

Whole record

Near future

Filtered record

Enriched record

Page 32: SysMO-DB: Towards “just enough” data exchange for the SysMO Consortium Katy Wolstencroft, University of Manchester, UK

RegisterExtractMatched to the JERMAdding metadata here

browse

search

++

Future Collections of

Records

+Meta-analysis

Page 33: SysMO-DB: Towards “just enough” data exchange for the SysMO Consortium Katy Wolstencroft, University of Manchester, UK

SpreadsheetRepository

ModelsRepository

SOPRepository

WorkflowRepository

Cons

ortiu

m

Dat

a

Mod

els

Proc

esse

sSo

ps a

nd W

orkfl

ows

What we have done..

SysMO-SEEK web interface

JWS

Onl

ine

AssetsCatalogue

YellowPages

SearchSysMO DB

JERM

Publ

ic d

ata

SBML Nature Protocols

Wor

kflo

w M

anag

emen

t Sy

stem

JERM

JERM

Page 34: SysMO-DB: Towards “just enough” data exchange for the SysMO Consortium Katy Wolstencroft, University of Manchester, UK

Outstanding Issues

Keeping data at project sites has responsibilities Reliability - Sites available continuously and promptly Support - Must be proof against virus attacks, etc. Archiving - Beyond the lifetime of the project. What happens

when a project is no longer part of the SysMO consortium

Page 35: SysMO-DB: Towards “just enough” data exchange for the SysMO Consortium Katy Wolstencroft, University of Manchester, UK

Lessons

Find a solution that fits in with current practices Start simple, show benefits, add more Engage with the people actually doing the work

PhD students, Post-docs Let the scientists retain control over their data

and who can see it Don’t reinvent. Use available vocabularies,

minimal model standards Help prevent people duplicating work by linking

the people as well as the resources

Page 36: SysMO-DB: Towards “just enough” data exchange for the SysMO Consortium Katy Wolstencroft, University of Manchester, UK

Acknowledgements SysMO-DB Team SysMO-PALS

myGrid, EML and JWS Online teams OMII-UK, Uni Southampton EMBL-EBI, MCISB