Upload
galvin
View
24
Download
0
Tags:
Embed Size (px)
DESCRIPTION
SysMo-DB: Towards “just enough” data exchange for the SysMO Consortium. Carole Goble, Uni of Manchester, UK Jacky Snoep, Uni of Manchester, UK / Stellenbosch, South Africa Isabel Rojas, EML Research gGmbH, Germany. Pan European collaboration. Systems Biology of Microorganisms. - PowerPoint PPT Presentation
Citation preview
SysMo-DB: Towards “just enough” data exchange for the SysMO Consortium Carole Goble, Uni of Manchester, UKJacky Snoep, Uni of Manchester, UK / Stellenbosch, South AfricaIsabel Rojas, EML Research gGmbH, Germany
Pan European collaboration. Systems Biology of
Microorganisms.
The transition from growing to non-growing Bacillus subtilis cells
Energy and Saccharomyces cerevisiae
Biology of Clostridium acetobutylicum
Gene interaction networks and models of cation homeostasis in Saccharomyces cerevisiae
http://www.sysmo.net
Eleven individual projects, 91 institutes Different research outcomes A cross-section of microorganisms,
incl. bacteria, archaea and yeast.
Record and describe the dynamic molecular processes occurring in microorganisms in a comprehensive way
Present these processes in the form of computerized mathematical models.
Pool research capacities and know-how.
Already running since April 2007. Runs for 3-5 years.
http://www.sysmo.net
BaCell-SysMO COSMIC
SUMO KOSMOBAC SysMO-LAB
PSYSMO Valla
MOSES TRANSLUCENT
STREAM SulfoSYS
The Problem
No one concept of experimentation or modelling
No planned, shared infrastructure for pooling
Own solutions
Suspicion
Data issues
Resource Issues
Own data solutions and collaboration environments. wikis, e-Groupware, PHProjekt, BaseCamp, PLONE, Alfresco, bespoke commercial … files and spreadsheets.
Suspicion and caution over sharing.Interesting interplay between modellers, experimentalists and bioinformaticians.
Many do not have data, or follow the standards that exist or know who is doing what. Much of the data cannot be compared
Different organisms, different strains.
No extra resources for the consortiums91 institutes, 11 consortiums, some overlapping
Started July 2008, 3 years, 3+3 people, 3 teams over 3 sites
Sensitively retrofit a data access, model handling and data integration platform.
Support and manage the diversity of data, models and competencies.
Web-based solution:exchange of data, models and processes (intra-
and inter-consortia).search for data, models and processes across
the initiative.dissemination of results.
DB SysMO-DB
Principles…1. A series of small victories
Low hanging fruit and early wins
2. Realistic Ease real pressure points and concerns
3. Don‘t reinvent (1) Borrow, link up, spread around what the
consortiums already have.
4. Don‘t reinvent (2) Use what is already available in the open
community and off the shelf
5. Sustainable Flexible, extensible and open
6. Migrate to standards Encourage standards adoption
Modellers
Minimum exchange
Experimentalists
Minimum exchange
Minimum exchange
Minimum exchange
Bioinformaticians
Social Approach Questionnaires
Ranked projects Bronze, Silver, Gold and Platinum
PALS 18 Postdocs and PhD students All three kinds of people Our design and technical
collaboration team Very intense face to face and
virtual collaboration UK and Continental PALS
Chapters Audits and Sharing
Methods, data, models, standards, software, schemas, spreadsheets, SOPs…..
Expe
rimen
tal
dat
a
Mod
els
Proc
esse
s
SysMO DB
Technical Approach
SysMO-SEEK web interface
JWS Online
SOPs
WorkflowsPublic Datasets
Consortium Datasets
Spreadsheets
Assets and Yellow
Pages Catalogues
Discovery SysMO-SEEKSingle, web based, access pointSingle sign-on access control &
versioning managementSingle search point over yellow
pages and assets cataloguePeople, Expertise, SOP, Equipment Metadata about Data – spreadsheets
and databasesModels (JWS Online), workflows
(myExperiment), public web services (BioCatalogue)
Call out to external resources (e.g. PubMed)
Does not hold results; holds metadata on results and links to results – pilot COSMIC consortium
A component for SysMO groups to incorporate in their own environments and applications
SysMO SEEK (20 questions)
Is there any group generating kinetic data?
Is this data available?
Who is working with which organism?
What methods are been used to determine enzyme activity?
Under which experimental conditions are my partners working on for the measurement of glucose concentration?
???
?
Models
Database of curated models and a model simulator Web service enabled to run from workflows Separate password protected websites for each project Through SEEK….
Special instance of JWS Online for SysMO Validate and run models from SysMO-SEEK and publish later. Access control as do for other assets
Access to other resources (Biomodels, Copasi) Semantic SBML from TRANSLUCENT project SBML and MIRIAM education
Publish, manage, run,
validate SBML models
Experimental Processes Protocols and SOPs SOPs assets deposited or
linked to SOP gathering Nature Protocols format
recommendation High level classification for
indexing and tagging Got a few, need more.
Experimental Processes
Protocol Title Authors Keywords Abstract Materials
ReagentsReagent Set UpEquipment
Time Taken Procedure Troubleshooting Critical Steps Anticipated Results References
Protocols and SOPs SOPs assets deposited or
linked to SOP gathering Nature Protocols format
recommendation High level classification for
indexing and tagging Got a few, need more.
Experimental Processes Deposition
Workflow Management System
Bioinformatics Processes: Workflows Automated, repeatable and shareable specification for
linking and running multiple computational tasks. Transparent provenance log of execution and results. Chaining together distributed analysis tools and data
sources: Annotation pipelines, data analysis pipelines, text mining, data integration, simulation sweeps
SBML model construction and population
Data sets and tools accessible to a workflow engine – Web Services, R scripts, BioMART, Java libraries, Grid Services, (MATLAB in beta)
Free and Open Source
Manipulation of SBML models in workflows
libSBML: data integration & constructing and annotating SBML models
Already in use by individual groups for Research
Ramp up when more data resources become workflow accessible
Libraries of SysMO workflows
Experimental Data Comparison and Exchange Public data sources
model organism databases – (e.g. SGD)
BRENDA …. Data produced by SysMO
SABIO-RK, iChiP, MeMo …. Local databases & Files Remain at the sites and retain
control in the groups. Excel Spreadsheets
The most common form of experimental data format.
SEEK repository assetM
etad
ata
SABIO-RK
BRENDA
myDB
mySpreadSheet
Minimum metadata for SysMO exchange; what an experiment is.
Extract metadata from datasets for the Assets catalogue - exchange Ontologies and controlled
vocabularies for annotation Expose data results through a
JERM interface – access Access controlled by consortiums,
groups and individuals
Harvesting standards, current practice and consortium schemas and spreadsheets
Inspired by MCISB Key Results initiative and SBRML [Paton]
Met
adat
a
SABIO-RK
BRENDA
myDB
mySpreadSheet
JERM Web Service
Access Interface
JERM Extractor and Access Wrapper
Access Control
SysMO SEEK
Just Enough Results Model
Data TypeSpecific
JERM First Cut
GeneralWhat type of data is it: Microarray, growth curve, enzyme activity…
Each data type has a different “minimal model” Phase 1 - Microarray and Metabolomics Careful mapping to the MIBBI standards (e.g. MIAME)
What was measured: Gene expression, OD, metabolite concentration….
What do the values in the datasets mean: Units, time series, repeats…
Experiment binding
Each individual results set is bound to an experiment/ investigation for exchange across different types of data
User's local file store
XMLXML
SysMO Seek;Assets catalogue
Corresponding JERM schema
Tag
Metadata of the file and Information about what is measured
Controlled vocabulary plug-in
Source and sink for workflows
Controlled deposit in spreadsheet repositoryLocal
Spreadsheetrespository
JERM Exchange Pilot Spring 2009
SysMO-LABCOSMIC
MOSES
BaCell-SysMO
“20 questions”
YellowPages
JERM Web Service
Access Interface
Met
adat
a
SysMO Data Models
JERM Ext & Wrap
Met
adat
a
Met
adat
a External Resources
Web Service Access Interface
Taverna Workflows
SysMO SEEK
Met
adat
a
Workflows
Assets
Rep
osito
ries
& R
eso
urce
sS
ervi
ceIn
terf
ace
Inte
grat
ion
Dis
cove
ry,
Acc
ess
Ann
ota
tion
&
Col
labo
ratio
nResultsCache
myE
xperim
ent
JWS Online
SABIO-RK
Met
adat
a
BioCatalogue
Access Control
Access Control
Related initiatives and sources OpenWetWare Cold Spring Harbor Protocols
MIBBI National Centre for BioOntologies OBO Foundary
Wikipathways Pathway commons Straininfo ONDEX
Pubmed
Training and Know-how SysMO-DB
Training on databases, models, workflow systems and web services, and best practice for the annotation of resources by metadata.
Kick-starting toolkits, workflows and SOP templates
Summer schools SysMO consortium (esp. PALS)
Social networking for shared content, know-how and best practice
Contribution Best of breed solutions in place already
Summary SysMO-DB is an exercise in:
Sensitively retrofitting a data access, model handling and data integration platform.
Supporting the diversity of data, models and competencies
Social mediation and manipulation
Towards Just Enough™ exchange
Acknowledgements SysMO-DB Team SysMO-PALS
myGrid, EML and JWS Online teams OMII-UK, Uni Southampton EBI, MCISB
Links myExperiment: http://www.myexperiment.org Taverna: http://www.mygrid.org.uk
JWS Online: http://jjj.biochem.sun.ac.za/
SABIO-RK http://sabio.villa-bosch.de/