31
SysMo-DB: Towards “just enough” data exchange for the SysMO Consortium Carole Goble, Uni of Manchester, UK Jacky Snoep, Uni of Manchester, UK / Stellenbosch, South Africa Isabel Rojas, EML Research gGmbH, Germany

SysMo-DB: Towards “just enough” data exchange for the SysMO Consortium

  • Upload
    galvin

  • View
    24

  • Download
    0

Embed Size (px)

DESCRIPTION

SysMo-DB: Towards “just enough” data exchange for the SysMO Consortium. Carole Goble, Uni of Manchester, UK Jacky Snoep, Uni of Manchester, UK / Stellenbosch, South Africa Isabel Rojas, EML Research gGmbH, Germany. Pan European collaboration. Systems Biology of Microorganisms. - PowerPoint PPT Presentation

Citation preview

Page 1: SysMo-DB:  Towards “just enough” data exchange for the SysMO Consortium

SysMo-DB: Towards “just enough” data exchange for the SysMO Consortium Carole Goble, Uni of Manchester, UKJacky Snoep, Uni of Manchester, UK / Stellenbosch, South AfricaIsabel Rojas, EML Research gGmbH, Germany

Page 2: SysMo-DB:  Towards “just enough” data exchange for the SysMO Consortium

Pan European collaboration. Systems Biology of

Microorganisms.

The transition from growing to non-growing Bacillus subtilis cells

Energy and Saccharomyces cerevisiae

Biology of Clostridium acetobutylicum

Gene interaction networks and models of cation homeostasis in Saccharomyces cerevisiae

http://www.sysmo.net

Page 3: SysMo-DB:  Towards “just enough” data exchange for the SysMO Consortium

Eleven individual projects, 91 institutes Different research outcomes A cross-section of microorganisms,

incl. bacteria, archaea and yeast.

Record and describe the dynamic molecular processes occurring in microorganisms in a comprehensive way

Present these processes in the form of computerized mathematical models.

Pool research capacities and know-how.

Already running since April 2007. Runs for 3-5 years.

http://www.sysmo.net

BaCell-SysMO COSMIC

SUMO KOSMOBAC SysMO-LAB

PSYSMO Valla

MOSES TRANSLUCENT

STREAM SulfoSYS

Page 4: SysMo-DB:  Towards “just enough” data exchange for the SysMO Consortium

The Problem

No one concept of experimentation or modelling

No planned, shared infrastructure for pooling

Page 5: SysMo-DB:  Towards “just enough” data exchange for the SysMO Consortium

Own solutions

Suspicion

Data issues

Resource Issues

Own data solutions and collaboration environments. wikis, e-Groupware, PHProjekt, BaseCamp, PLONE, Alfresco, bespoke commercial … files and spreadsheets.

Suspicion and caution over sharing.Interesting interplay between modellers, experimentalists and bioinformaticians.

Many do not have data, or follow the standards that exist or know who is doing what. Much of the data cannot be compared

Different organisms, different strains.

No extra resources for the consortiums91 institutes, 11 consortiums, some overlapping

Page 6: SysMo-DB:  Towards “just enough” data exchange for the SysMO Consortium

Started July 2008, 3 years, 3+3 people, 3 teams over 3 sites

Sensitively retrofit a data access, model handling and data integration platform.

Support and manage the diversity of data, models and competencies.

Web-based solution:exchange of data, models and processes (intra-

and inter-consortia).search for data, models and processes across

the initiative.dissemination of results.

DB SysMO-DB

Page 7: SysMo-DB:  Towards “just enough” data exchange for the SysMO Consortium

Principles…1. A series of small victories

Low hanging fruit and early wins

2. Realistic Ease real pressure points and concerns

3. Don‘t reinvent (1) Borrow, link up, spread around what the

consortiums already have.

4. Don‘t reinvent (2) Use what is already available in the open

community and off the shelf

5. Sustainable Flexible, extensible and open

6. Migrate to standards Encourage standards adoption

Page 8: SysMo-DB:  Towards “just enough” data exchange for the SysMO Consortium

Modellers

Minimum exchange

Experimentalists

Minimum exchange

Minimum exchange

Minimum exchange

Bioinformaticians

Page 9: SysMo-DB:  Towards “just enough” data exchange for the SysMO Consortium

Social Approach Questionnaires

Ranked projects Bronze, Silver, Gold and Platinum

PALS 18 Postdocs and PhD students All three kinds of people Our design and technical

collaboration team Very intense face to face and

virtual collaboration UK and Continental PALS

Chapters Audits and Sharing

Methods, data, models, standards, software, schemas, spreadsheets, SOPs…..

Page 10: SysMo-DB:  Towards “just enough” data exchange for the SysMO Consortium

Expe

rimen

tal

dat

a

Mod

els

Proc

esse

s

SysMO DB

Technical Approach

SysMO-SEEK web interface

JWS Online

SOPs

WorkflowsPublic Datasets

Consortium Datasets

Spreadsheets

Assets and Yellow

Pages Catalogues

Page 11: SysMo-DB:  Towards “just enough” data exchange for the SysMO Consortium

Discovery SysMO-SEEKSingle, web based, access pointSingle sign-on access control &

versioning managementSingle search point over yellow

pages and assets cataloguePeople, Expertise, SOP, Equipment Metadata about Data – spreadsheets

and databasesModels (JWS Online), workflows

(myExperiment), public web services (BioCatalogue)

Call out to external resources (e.g. PubMed)

Does not hold results; holds metadata on results and links to results – pilot COSMIC consortium

A component for SysMO groups to incorporate in their own environments and applications

Page 12: SysMo-DB:  Towards “just enough” data exchange for the SysMO Consortium
Page 13: SysMo-DB:  Towards “just enough” data exchange for the SysMO Consortium

SysMO SEEK (20 questions)

Is there any group generating kinetic data?

Is this data available?

Who is working with which organism?

What methods are been used to determine enzyme activity?

Under which experimental conditions are my partners working on for the measurement of glucose concentration?

???

?

Page 14: SysMo-DB:  Towards “just enough” data exchange for the SysMO Consortium

Models

Database of curated models and a model simulator Web service enabled to run from workflows Separate password protected websites for each project Through SEEK….

Special instance of JWS Online for SysMO Validate and run models from SysMO-SEEK and publish later. Access control as do for other assets

Access to other resources (Biomodels, Copasi) Semantic SBML from TRANSLUCENT project SBML and MIRIAM education

Publish, manage, run,

validate SBML models

Page 15: SysMo-DB:  Towards “just enough” data exchange for the SysMO Consortium

Experimental Processes Protocols and SOPs SOPs assets deposited or

linked to SOP gathering Nature Protocols format

recommendation High level classification for

indexing and tagging Got a few, need more.

Page 16: SysMo-DB:  Towards “just enough” data exchange for the SysMO Consortium

Experimental Processes

Protocol Title Authors Keywords Abstract Materials

ReagentsReagent Set UpEquipment

Time Taken Procedure Troubleshooting Critical Steps Anticipated Results References

Protocols and SOPs SOPs assets deposited or

linked to SOP gathering Nature Protocols format

recommendation High level classification for

indexing and tagging Got a few, need more.

Page 17: SysMo-DB:  Towards “just enough” data exchange for the SysMO Consortium

Experimental Processes Deposition

Page 18: SysMo-DB:  Towards “just enough” data exchange for the SysMO Consortium

Workflow Management System

Bioinformatics Processes: Workflows Automated, repeatable and shareable specification for

linking and running multiple computational tasks. Transparent provenance log of execution and results. Chaining together distributed analysis tools and data

sources: Annotation pipelines, data analysis pipelines, text mining, data integration, simulation sweeps

SBML model construction and population

Data sets and tools accessible to a workflow engine – Web Services, R scripts, BioMART, Java libraries, Grid Services, (MATLAB in beta)

Free and Open Source

Page 19: SysMo-DB:  Towards “just enough” data exchange for the SysMO Consortium

Manipulation of SBML models in workflows

libSBML: data integration & constructing and annotating SBML models

Page 20: SysMo-DB:  Towards “just enough” data exchange for the SysMO Consortium

Already in use by individual groups for Research

Ramp up when more data resources become workflow accessible

Libraries of SysMO workflows

Page 21: SysMo-DB:  Towards “just enough” data exchange for the SysMO Consortium

Experimental Data Comparison and Exchange Public data sources

model organism databases – (e.g. SGD)

BRENDA …. Data produced by SysMO

SABIO-RK, iChiP, MeMo …. Local databases & Files Remain at the sites and retain

control in the groups. Excel Spreadsheets

The most common form of experimental data format.

SEEK repository assetM

etad

ata

SABIO-RK

BRENDA

myDB

mySpreadSheet

Page 22: SysMo-DB:  Towards “just enough” data exchange for the SysMO Consortium

Minimum metadata for SysMO exchange; what an experiment is.

Extract metadata from datasets for the Assets catalogue - exchange Ontologies and controlled

vocabularies for annotation Expose data results through a

JERM interface – access Access controlled by consortiums,

groups and individuals

Harvesting standards, current practice and consortium schemas and spreadsheets

Inspired by MCISB Key Results initiative and SBRML [Paton]

Met

adat

a

SABIO-RK

BRENDA

myDB

mySpreadSheet

JERM Web Service

Access Interface

JERM Extractor and Access Wrapper

Access Control

SysMO SEEK

Just Enough Results Model

Page 23: SysMo-DB:  Towards “just enough” data exchange for the SysMO Consortium

Data TypeSpecific

JERM First Cut

GeneralWhat type of data is it: Microarray, growth curve, enzyme activity…

Each data type has a different “minimal model” Phase 1 - Microarray and Metabolomics Careful mapping to the MIBBI standards (e.g. MIAME)

What was measured: Gene expression, OD, metabolite concentration….

What do the values in the datasets mean: Units, time series, repeats…

Experiment binding

Each individual results set is bound to an experiment/ investigation for exchange across different types of data

Page 24: SysMo-DB:  Towards “just enough” data exchange for the SysMO Consortium

User's local file store

XMLXML

SysMO Seek;Assets catalogue

Corresponding JERM schema

Tag

Metadata of the file and Information about what is measured

Controlled vocabulary plug-in

Source and sink for workflows

Controlled deposit in spreadsheet repositoryLocal

Spreadsheetrespository

Page 25: SysMo-DB:  Towards “just enough” data exchange for the SysMO Consortium

JERM Exchange Pilot Spring 2009

SysMO-LABCOSMIC

MOSES

BaCell-SysMO

“20 questions”

Page 26: SysMo-DB:  Towards “just enough” data exchange for the SysMO Consortium

YellowPages

JERM Web Service

Access Interface

Met

adat

a

SysMO Data Models

JERM Ext & Wrap

Met

adat

a

Met

adat

a External Resources

Web Service Access Interface

Taverna Workflows

SysMO SEEK

Met

adat

a

Workflows

Assets

Rep

osito

ries

& R

eso

urce

sS

ervi

ceIn

terf

ace

Inte

grat

ion

Dis

cove

ry,

Acc

ess

Ann

ota

tion

&

Col

labo

ratio

nResultsCache

myE

xperim

ent

JWS Online

SABIO-RK

Met

adat

a

BioCatalogue

Access Control

Access Control

Page 27: SysMo-DB:  Towards “just enough” data exchange for the SysMO Consortium

Related initiatives and sources OpenWetWare Cold Spring Harbor Protocols

MIBBI National Centre for BioOntologies OBO Foundary

Wikipathways Pathway commons Straininfo ONDEX

Pubmed

Page 28: SysMo-DB:  Towards “just enough” data exchange for the SysMO Consortium

Training and Know-how SysMO-DB

Training on databases, models, workflow systems and web services, and best practice for the annotation of resources by metadata.

Kick-starting toolkits, workflows and SOP templates

Summer schools SysMO consortium (esp. PALS)

Social networking for shared content, know-how and best practice

Contribution Best of breed solutions in place already

Page 29: SysMo-DB:  Towards “just enough” data exchange for the SysMO Consortium

Summary SysMO-DB is an exercise in:

Sensitively retrofitting a data access, model handling and data integration platform.

Supporting the diversity of data, models and competencies

Social mediation and manipulation

Towards Just Enough™ exchange

Page 30: SysMo-DB:  Towards “just enough” data exchange for the SysMO Consortium

Acknowledgements SysMO-DB Team SysMO-PALS

myGrid, EML and JWS Online teams OMII-UK, Uni Southampton EBI, MCISB

Page 31: SysMo-DB:  Towards “just enough” data exchange for the SysMO Consortium

Links myExperiment: http://www.myexperiment.org Taverna: http://www.mygrid.org.uk

JWS Online: http://jjj.biochem.sun.ac.za/

SABIO-RK http://sabio.villa-bosch.de/