Upload
elisabeth-stevenson
View
218
Download
0
Tags:
Embed Size (px)
Citation preview
SysMO-DB: Towards “just enough” data exchange for the SysMO Consortium
Katy Wolstencroft, University of Manchester, UK
Pan European collaboration Eleven individual projects, 91 institutes
Different research outcomes A cross-section of microorganisms, incl.
bacteria, archaea and yeast
Record and describe the dynamic molecular processes occurring in microorganisms in a comprehensive way
Present these processes in the form of computerized mathematical models
Pool research capacities and know-how
Already running since April 2007 Runs for 3-5 years
http://www.sysmo.net
Systems Biology of Microorganisms
The Problem
No one concept of experimentation or modelling
No planned, shared infrastructure for pooling
Started July 2008, 3 years, 3 staff + 3 investigators people, 3 teams over 3 sites
Sensitively retrofit a data access, model handling and data integration platform.
Support and manage the diversity of data, models and competencies.
Web-based solution:exchange of data, models and processes (intra-
and inter-consortia)search for data, models and processes across
the initiativedissemination of results
SysMO-DB
Own solutions
Suspicion
Data issues
Resource Issues
Own data solutions and collaboration environments. Wikis, e-Groupware, PHProjekt, BaseCamp, PLONE, Alfresco, bespoke commercial … files and spreadsheets.
Suspicion and caution over sharing.Interesting interplay between modellers, experimentalists and bioinformaticians
Many do not follow standards that exist or know who is doing what.
No extra resources for the consortiums91 institutes, 11 consortiums, some overlapping
Types of data
Multiple omics genomics, transcriptomics proteomics, metabolomics
Images Reaction Kinetics Models Relationships between data sets/experiments
Procedures, experiments, data, results and models Analysis of dataThe same across many Systems Biology projects
Principles…
A series of small victories Realistic Don‘t reinvent Sustainable and extensible Migrate to standards
Provide instant gratification Address doubt and anxiety Incremental development
The Lowest Hanging Fruit
A Catalogue of SysMO assets SysMO Yellow Pages The people and their expertise The institutions and their facilities Data – experimental data sets Data – analysed results Data – external reference data sets Models Processes – laboratory protocols and bioinformatics
analyses
The catalogue references assets held elsewhere
Dat
a
Mod
els
Proc
esse
s
SysMO DB
Technical Approach
SysMO-SEEK web interface
Assets and Yellow
Pages Catalogues
JERM
Social Approach PALS
21 Postdocs and PhD students Experimentalists, modellers and
bioinformaticians Our design and technical
collaboration team Very intense face to face and
virtual collaboration UK and Continental PALS
Chapters Audits and Sharing
Methods, data, models, standards, software, schemas, spreadsheets, SOPs…..
Communication via PALs
DB team PALS Projects
Show what is thereSuggest what is possible
Ask for requirements
Give requirementsTell priorities
Rate outcomesSuggest improvements
Double checkTransmit
Disseminate
Collect answers
Discovery SysMO-SEEK
Single, web based, access point Access control & Versioning managementYellow pages (“who is who”)
People, Expertise, Equipment Assets catalogue (“who has what”)
SOPs, Spreadsheets, pre-published models Metadata about Data held by projects
Access to other repositories Models (JWS Online), Workflows (myExperiment), Public web services (BioCatalogue)
Call out to external resources e.g. PubMed
Does not hold data and results
Holds metadata on results and links to results
A component for SysMO groups to incorporate in their own environments and applications
Sharing Policies Default private until
you say otherwise
Project defaults Private Share with the group Share with project Share with sysmo
“Just Enough” Exchangeof SysMO Assets
Experimental Processes
Protocol Title Authors Keywords Abstract Materials
ReagentsReagent Set UpEquipment
Time Taken Procedure Troubleshooting Critical Steps Anticipated Results References
Protocols and SOPs Nature Protocols format
recommendation You can upload Protocols in
any format, but if you use this one, we will index it and make searching easier
Encouraging standardisation
Workflow Management System
Bioinformatics Processes: Workflows Data preparation, annotation and analysis
pipelines SBML model construction and population Linking together Data sets, Web Services,
R scripts, BioMART, Java libraries, Grid Services, (MATLAB in beta)
Workflows as a mechanism for linking inside SEEK
Free and Open Source
Libraries of SysMO workflows
Models
SBML is the recommended format Not all models are SBML
JWS online allows storing and simulation of SBML models But - all models need to be shared JWS Online doesn’t have version and access control
Models can be shared in SEEK instead of directly in JWS online
Can still connect to JWS online and run simulations
Models
JWS online – a database of curated models and a model simulator Web service enabled to run from workflows
Used and accessed through SEEK…. Special instance of JWS Online for SysMO Store, validate and run models from SysMO-SEEK and publish later
Access to other models resources Biomodels, Copasi and Semantic SBML
Data Comparison and Exchange Public data sources
model organism databases – (e.g. SGD)
BRENDA …. Data produced by SysMO
SABIO-RK, iChiP, MeMo …. Local databases & Files
Excel Spreadsheets The most common form of
experimental data format.
Proteomics
Met
adat
a
Metabolomics
Microarray
Proteomics
Single Cell Data
Variable descriptions of dataLittle adoption of community controlled vocabulary terms
JERM
JERM “Just Enough Results Model” Minimum information to exchange data
What type of data is it Microarray, growth curve, enzyme activity…
What was measured Gene expression, OD, metabolite concentration….
What do the values in the datasets mean Units, time series, repeats….
Which experiment does it relate to How was the data created
SOPs and protocols
Harvesting standards, current practice and consortium schemas and spreadsheets Inspired by MCISB Key Results initiative and SBRML [Paton]
The Idea
For each data type….. Transcriptomics Proteomics Metabolomics Single Cell Data
Generate and apply…. JERM template JERM extractor for data host Subset registered in SEEK Access / export through JERM interface / template
Define a JERM….. Top down analysis of standards Bottom up analysis of practice
1
2
3
ISA-TAB
COSMIC
BaCell-SysMO
SysMOLab
MOSES
Alfresco
Alfresco
Wiki
Wiki
ANOTHER
A DATASTORE
JERM Adaptors
JERM Source Extractor Generator New spreadsheets adopt JERM
templates Legacy spreadsheet JERM
mapper Databases have JERM mapper
Spreadsheet Ontology Annotator Restrict the values that a range
of fields can have
Just Enough Results Model Tools
Met
adat
a SABIO-RK
BRENDA
myDB
mySpreadSheet
JERM Web Service Access Interface
Access Control
JERM Extractor and Access Wrapper Layer
JERMTemplate
SourceAccess
and Harvester
SourceExtractor
Experimental Data Metadata
People
ProjectsAssay
Study
Experimental conditionsFactors studied
Models
SOPs
Homogenised terminology and values in the datasets themselves
Workflows
Based on ISA-TAB
Investigation
SEEK + JERM
Incremental Annotation
Metadata can be added to assets at any time Extracted from JERM templates Added by the data owner through SEEK Added by another SysMO consortium member
with editing permission
In Practice for Spreadsheets
Native JERM Template JERMed
+
+ +
RegisterExtractMatched to the JERMAdding metadata
browse
search
++
Now
Whole record
RegisterExtractMatched to the JERMAdding metadata here
browse
search
+++
Whole record
Near future
Filtered record
Enriched record
RegisterExtractMatched to the JERMAdding metadata here
browse
search
++
Future Collections of
Records
+Meta-analysis
SpreadsheetRepository
ModelsRepository
SOPRepository
WorkflowRepository
Cons
ortiu
m
Dat
a
Mod
els
Proc
esse
sSo
ps a
nd W
orkfl
ows
What we have done..
SysMO-SEEK web interface
JWS
Onl
ine
AssetsCatalogue
YellowPages
SearchSysMO DB
JERM
Publ
ic d
ata
SBML Nature Protocols
Wor
kflo
w M
anag
emen
t Sy
stem
JERM
JERM
Outstanding Issues
Keeping data at project sites has responsibilities Reliability - Sites available continuously and promptly Support - Must be proof against virus attacks, etc. Archiving - Beyond the lifetime of the project. What happens
when a project is no longer part of the SysMO consortium
Lessons
Find a solution that fits in with current practices Start simple, show benefits, add more Engage with the people actually doing the work
PhD students, Post-docs Let the scientists retain control over their data
and who can see it Don’t reinvent. Use available vocabularies,
minimal model standards Help prevent people duplicating work by linking
the people as well as the resources
Acknowledgements SysMO-DB Team SysMO-PALS
myGrid, EML and JWS Online teams OMII-UK, Uni Southampton EMBL-EBI, MCISB