43
SysMO-DB: Just Enough Exchange for Systems Biology Data and Models Carole Goble, Katy Wolstencroft, Stuart Owen, Sergejs Aleksejevs - University of Manchester Wolfgang Müller, O. Krebs, Isabel Rojas – EML Research gGmbH (=not for profit) Jacky Snoep - University of Stellenbosch eScience Workshop, Pittsburgh, PA

SysMO-DB: Just Enough Exchange for Systems Biology Data and Models Carole Goble, Katy Wolstencroft, Stuart Owen, Sergejs Aleksejevs - University of Manchester

Embed Size (px)

Citation preview

Page 1: SysMO-DB: Just Enough Exchange for Systems Biology Data and Models Carole Goble, Katy Wolstencroft, Stuart Owen, Sergejs Aleksejevs - University of Manchester

SysMO-DB: Just Enough Exchange for Systems Biology Data and Models

Carole Goble, Katy Wolstencroft, Stuart Owen, Sergejs Aleksejevs - University of Manchester

Wolfgang Müller, O. Krebs, Isabel Rojas – EML Research gGmbH (=not for profit)Jacky Snoep - University of Stellenbosch 

MS eScience Workshop, Pittsburgh, PA

Page 2: SysMO-DB: Just Enough Exchange for Systems Biology Data and Models Carole Goble, Katy Wolstencroft, Stuart Owen, Sergejs Aleksejevs - University of Manchester

SysMO=SYStems biology of Micro Organisms

(2)

(2)

(29)

(22)

(9)(4)

(1)

11 projects, 91 partners, 9 countries, started 2007

Page 3: SysMO-DB: Just Enough Exchange for Systems Biology Data and Models Carole Goble, Katy Wolstencroft, Stuart Owen, Sergejs Aleksejevs - University of Manchester

Started July 2008, 3 years, 3 staff + 3 investigators, 3 teams over 3 sites

Sensitively retrofit a data access, model handling and data integration platform.

Support and manage the diversity of data, models and competencies.

Web-based solution:exchange of data, models and processes (intra- 

and inter-consortia).search for data, models and processes across 

the initiative.dissemination of results.

SysMO-DB

Page 4: SysMO-DB: Just Enough Exchange for Systems Biology Data and Models Carole Goble, Katy Wolstencroft, Stuart Owen, Sergejs Aleksejevs - University of Manchester

SysMO-DB Team

University of Stellenbosch, South AfricaUniversity of Manchester, UK

Jacky Snoep

EML Research gGmbH, Germany

Isabel Rojas

University of Manchester, UK

Olga Krebs

Wolfgang Müller

Sergejs Aleksejevs

Carole Goble

Stuart Owen

Katy Wolstencroft

Page 5: SysMO-DB: Just Enough Exchange for Systems Biology Data and Models Carole Goble, Katy Wolstencroft, Stuart Owen, Sergejs Aleksejevs - University of Manchester

Connect projects, connect to outside

Project specific solutions

Internally used tools & data

Outside data and tools

Project

Public

My Disk: DataModelsWorkflows

Personal

SysMO-DB, inter-project

Page 6: SysMO-DB: Just Enough Exchange for Systems Biology Data and Models Carole Goble, Katy Wolstencroft, Stuart Owen, Sergejs Aleksejevs - University of Manchester

Own solutions

Suspicion

Data issues

Resource Issues

Own data solutions and collaboration environments. wikis, e-Groupware, PHProject, BaseCamp, PLONE, Alfresco, bespoke commercial …  files and spreadsheets.

Suspicion and caution over sharing.Interesting interplay between modellers, experimentalists and bioinformaticians.

 Many do not have data, or follow the standards that exist or know who is doing what.   Much of the data cannot be compared

Different organisms, different strains.

No extra resources for the consortiums91 institutes, 11 consortiums, some overlapping

Page 7: SysMO-DB: Just Enough Exchange for Systems Biology Data and Models Carole Goble, Katy Wolstencroft, Stuart Owen, Sergejs Aleksejevs - University of Manchester

Principles…

Go for a series of small victories Realistic Don‘t reinvent Migrate to standards Sustainable and extensible

Provide instant gratification Address doubt and anxiety Build it

Page 8: SysMO-DB: Just Enough Exchange for Systems Biology Data and Models Carole Goble, Katy Wolstencroft, Stuart Owen, Sergejs Aleksejevs - University of Manchester

Modellers

Exchange

Experimentalists

Exch

ange

Exchange

Exchange

Bioinformaticians

Three types of people

Page 9: SysMO-DB: Just Enough Exchange for Systems Biology Data and Models Carole Goble, Katy Wolstencroft, Stuart Owen, Sergejs Aleksejevs - University of Manchester

„Natural“ collaboration within SysMO

Short, simplified, black and white: Collaboration during 

project design Varying methods of 

collaboration during project Binomes (One modeller, one 

experimentalist) Groups collaborating with 

groups (occasional/formalized exchange of information)

Varying success Need for a watering 

hole/meeting point Application where 

experimentalists/bioinf/ modelers meet

({{flickr| |title=Hot Watering Hole Action |description= |photographer=betty x1138 |photographer_location=NYC, USA |photographer_url=http://flickr.com/photos/98334721@N00 |flickr_url=http://flickr.com/photos/98334721@N00/25901056 |taken=2005-07-14 09:04:32)

({{flickr| |title=Hot Watering Hole Action |description= |photographer=betty x1138 |photographer_location=NYC, USA |photographer_url=http://flickr.com/photos/98334721@N00 |flickr_url=http://flickr.com/photos/98334721@N00/25901056 |taken=2005-07-14 09:04:32)

({{flickr| |title=Hot Watering Hole Action |description= |photographer=betty x1138 |photographer_location=NYC, USA |photographer_url=http://flickr.com/photos/98334721@N00 |flickr_url=http://flickr.com/photos/98334721@N00/25901056 |taken=2005-07-14 09:04:32)

Trying to make experimentalists, modellers, bioinformaticians peacefully share resources

Page 10: SysMO-DB: Just Enough Exchange for Systems Biology Data and Models Carole Goble, Katy Wolstencroft, Stuart Owen, Sergejs Aleksejevs - University of Manchester

Some numbers& Some consequences

1 Software Engineer  1 Bioinformatician, 1 Bio-database specialist

11 projects, 91 partners 20 programmer days/year/project 2.5 programmer days/year/partner “just in case“ approach impossible

Focus on real needs “just in time“, “just enough“ The right 20%

Help people help themselves Communication!

20%

80%

80-20-rule:80% of the featureswon‘t be used anyway

Useful features

Page 11: SysMO-DB: Just Enough Exchange for Systems Biology Data and Models Carole Goble, Katy Wolstencroft, Stuart Owen, Sergejs Aleksejevs - University of Manchester

Social Approach Questionnaires PALs (Project Area Liaison)

21 Postdocs and PhD students Bio/bioinf/modeller Our design and technical 

collaboration team Very intense face to face and 

virtual collaboration UK and Continental PALS 

Chapters Audits and Sharing

Methods, data, models, standards, software, schemas, spreadsheets, SOPs…..

Page 12: SysMO-DB: Just Enough Exchange for Systems Biology Data and Models Carole Goble, Katy Wolstencroft, Stuart Owen, Sergejs Aleksejevs - University of Manchester

Communication via PALs

DB team PALS Projects

Show what is thereSuggest what is possible

Ask for requirements

Give requirementsTell priorities

Rate outcomesSuggest improvements

Double checkTransmit

Disseminate

Collect answers

Page 13: SysMO-DB: Just Enough Exchange for Systems Biology Data and Models Carole Goble, Katy Wolstencroft, Stuart Owen, Sergejs Aleksejevs - University of Manchester

Need to find the guy who does xyz: Yellow pages

Need to storeStandard Operating Procedures

Almost all our data is Excel

Outcome of first PALs meeting:

Page 14: SysMO-DB: Just Enough Exchange for Systems Biology Data and Models Carole Goble, Katy Wolstencroft, Stuart Owen, Sergejs Aleksejevs - University of Manchester

What‘s thereSysMO-SEEK screenshots

Page 15: SysMO-DB: Just Enough Exchange for Systems Biology Data and Models Carole Goble, Katy Wolstencroft, Stuart Owen, Sergejs Aleksejevs - University of Manchester

Yellow pages

Tag clouds

Bookmarks

Yellow pages tabs

ISA tabs

Page 16: SysMO-DB: Just Enough Exchange for Systems Biology Data and Models Carole Goble, Katy Wolstencroft, Stuart Owen, Sergejs Aleksejevs - University of Manchester

Standard Operation Procedures

Page 17: SysMO-DB: Just Enough Exchange for Systems Biology Data and Models Carole Goble, Katy Wolstencroft, Stuart Owen, Sergejs Aleksejevs - University of Manchester

JWS connection for modellers

Page 18: SysMO-DB: Just Enough Exchange for Systems Biology Data and Models Carole Goble, Katy Wolstencroft, Stuart Owen, Sergejs Aleksejevs - University of Manchester

View Study

Page 19: SysMO-DB: Just Enough Exchange for Systems Biology Data and Models Carole Goble, Katy Wolstencroft, Stuart Owen, Sergejs Aleksejevs - University of Manchester

New Assay (ISA)

Page 20: SysMO-DB: Just Enough Exchange for Systems Biology Data and Models Carole Goble, Katy Wolstencroft, Stuart Owen, Sergejs Aleksejevs - University of Manchester

Rights and sharing

Page 21: SysMO-DB: Just Enough Exchange for Systems Biology Data and Models Carole Goble, Katy Wolstencroft, Stuart Owen, Sergejs Aleksejevs - University of Manchester

Rights and sharing: create group

Page 22: SysMO-DB: Just Enough Exchange for Systems Biology Data and Models Carole Goble, Katy Wolstencroft, Stuart Owen, Sergejs Aleksejevs - University of Manchester

So much for the webapp

Rights+Sharing Connection to modelers‘ tools

Yellow pages SOPs

Page 23: SysMO-DB: Just Enough Exchange for Systems Biology Data and Models Carole Goble, Katy Wolstencroft, Stuart Owen, Sergejs Aleksejevs - University of Manchester

Almost there: Improved excel support

Matthew Horridge

Page 24: SysMO-DB: Just Enough Exchange for Systems Biology Data and Models Carole Goble, Katy Wolstencroft, Stuart Owen, Sergejs Aleksejevs - University of Manchester

Towards Just-Enough Exchange

Incremental steps from beta to beta 

Page 25: SysMO-DB: Just Enough Exchange for Systems Biology Data and Models Carole Goble, Katy Wolstencroft, Stuart Owen, Sergejs Aleksejevs - University of Manchester

Towards Just-Enough Exchange

Largely a story about how to handle Excel sheets for user‘s benefits

Page 26: SysMO-DB: Just Enough Exchange for Systems Biology Data and Models Carole Goble, Katy Wolstencroft, Stuart Owen, Sergejs Aleksejevs - University of Manchester

SysMO Just Enough Exchange

COSMIC

Alfresco

BaCell-SysMO

Alfresco

MOSES

Wiki

SysMO-LAB

Wiki

SABIO-RK

Public Resources

SABIO-RK

Spreadsheets

SpreadsheetsSpread

sheets

Spreadsheets

BASE

Page 27: SysMO-DB: Just Enough Exchange for Systems Biology Data and Models Carole Goble, Katy Wolstencroft, Stuart Owen, Sergejs Aleksejevs - University of Manchester

Need for tradeoff

Huge number of systems Huge number of standards (MIBBI, OBO…) Some of them big standards

Too much to cope with a few people, but: Comparison needs standardisation Search needs standardisation Need to move incrementally to just-enough 

standard implementation

Page 28: SysMO-DB: Just Enough Exchange for Systems Biology Data and Models Carole Goble, Katy Wolstencroft, Stuart Owen, Sergejs Aleksejevs - University of Manchester

Path = goalThe journey is part of the reward

Let people use what they use anyway If changes necessary, 

be as unintrusive as possible Be aware of legacy data Nudge people towards best practises Give instantly useful added value to as many 

users as possible: Simple search, simple exchange, simple tool use

Page 29: SysMO-DB: Just Enough Exchange for Systems Biology Data and Models Carole Goble, Katy Wolstencroft, Stuart Owen, Sergejs Aleksejevs - University of Manchester

A roadmap

Provide convincing Web 2.0 functionality for use and as appetizer Yellow pages SOPs

Upload service: Hand-triggered upload of link/file Hand-added metadata

Harvesting+change detection service Automatic download Hand-added metadata

Support for Excel templates Promote internal standards by use + tooling Mappers + parsers Classifiers

Use other data types where appropriate  SBML, Matlab, Mathematica…

Page 30: SysMO-DB: Just Enough Exchange for Systems Biology Data and Models Carole Goble, Katy Wolstencroft, Stuart Owen, Sergejs Aleksejevs - University of Manchester

Stability hierarchy

Single group

Single SysMO project

Whole SysMO

Template for a group of experiments

More stable JERM data modelTemplate best practise

Project-level template

Increasing stability

Parsers/ annotators

Enter into that

Use mappers where needed

Page 31: SysMO-DB: Just Enough Exchange for Systems Biology Data and Models Carole Goble, Katy Wolstencroft, Stuart Owen, Sergejs Aleksejevs - University of Manchester

JERM Extraction Architecture

MapperExtractor

Template recognizer

Data handlerHarvester Data handler

Classifier/Dispatcher Template recognizer

Extractor

DataM

etad.

Data

Metad.

Data

MapperParser

Data

Metad.

MapperExtractor

Template recognizer

Data handlerHarvester Data handler

Classifier/Dispatcher Template recognizer

Extractor

Data

MapperParser

Project repositories

Page 32: SysMO-DB: Just Enough Exchange for Systems Biology Data and Models Carole Goble, Katy Wolstencroft, Stuart Owen, Sergejs Aleksejevs - University of Manchester

OopsSome projects not prolonged

Need all project data in the system fast,so…

Page 33: SysMO-DB: Just Enough Exchange for Systems Biology Data and Models Carole Goble, Katy Wolstencroft, Stuart Owen, Sergejs Aleksejevs - University of Manchester

JERM Extraction Architecture

MapperExtractor

Template recognizer

Data handlerHarvester Data handler

Classifier/Dispatcher Template recognizer

Extractor

DataM

etad.

Data

Metad.

Data

MapperParser

Data

Metad.

MapperExtractor

Template recognizer

Data handlerHarvester Data handler

Classifier/Dispatcher Template recognizer

Extractor

DataData

Data

MapperParser

DataProject repositories

Page 34: SysMO-DB: Just Enough Exchange for Systems Biology Data and Models Carole Goble, Katy Wolstencroft, Stuart Owen, Sergejs Aleksejevs - University of Manchester

Lessons we‘re learningSome interesting bits along the way

Page 35: SysMO-DB: Just Enough Exchange for Systems Biology Data and Models Carole Goble, Katy Wolstencroft, Stuart Owen, Sergejs Aleksejevs - University of Manchester

Subsetting: Don‘t overwhelm

Standards need to be comprehensive

Goal: „Minimum information“… (MIBBI)

Tends to be superset of what is needed for a project

Example for non-applicable attributes  Tissue of a single cell Gender

Useful to use adapted subset-templates

Experimental design selection list

Page 36: SysMO-DB: Just Enough Exchange for Systems Biology Data and Models Carole Goble, Katy Wolstencroft, Stuart Owen, Sergejs Aleksejevs - University of Manchester

From biofolksonomy to ontology

Observation: Fast growing set of 

standards Standards are moving 

target Incremental approach

Keyword annotation Controlled selection lists Home-brewed taxonomies Use/contribution to 

standard ontologies Provide migration tools

Tags + suggestions

Home-brewed taxonomy

Page 37: SysMO-DB: Just Enough Exchange for Systems Biology Data and Models Carole Goble, Katy Wolstencroft, Stuart Owen, Sergejs Aleksejevs - University of Manchester

A word on software

Template tooling Excel JAVA

SysMO-SEEK (open source under Apache license) Ruby on Rails

Convention over configuration Libraries & plugins

Rails specific (e.g. acts_as_authenticated) SOLR & Lucene introduce JAVA/Ruby

Database:MySQL also tested with SQLite(exclude db depedencies)

Page 38: SysMO-DB: Just Enough Exchange for Systems Biology Data and Models Carole Goble, Katy Wolstencroft, Stuart Owen, Sergejs Aleksejevs - University of Manchester

Summary

SysMO-DB as a virtual meeting point for different flavours of systems biologists

SysMO-DB‘s mantra: Just enough just in time Flexible JERM extracture architecture Just enough metadata (incremental) Lot done  still a lot todo 

Page 39: SysMO-DB: Just Enough Exchange for Systems Biology Data and Models Carole Goble, Katy Wolstencroft, Stuart Owen, Sergejs Aleksejevs - University of Manchester

Challenges ahead…

Social PALs work great and motivated Now need moremoremore datadatadata

Technical Publishing into public repositories Search + exploration: The test for data quality

Hierarchical Faceted Search Distributed search via Taverna workflows

More workflows via SysMO-SEEK Improve modelling support

Page 40: SysMO-DB: Just Enough Exchange for Systems Biology Data and Models Carole Goble, Katy Wolstencroft, Stuart Owen, Sergejs Aleksejevs - University of Manchester

Bonus track: what if…

…the average data quality is below par?

„Nagging functionality“ Remind people of potentially faulty metadata Give suggestions what to improve and how Give possibility to create automatic mappings

Page 41: SysMO-DB: Just Enough Exchange for Systems Biology Data and Models Carole Goble, Katy Wolstencroft, Stuart Owen, Sergejs Aleksejevs - University of Manchester

Thanks

EML People:  Isabel Olga

UMAN People: Carole Katy Finn Stuart Sergejs

Jacky at Stellenbosch

BBSRC BMBF KTF

…and Microsoft for sponsoring this workshop

Page 42: SysMO-DB: Just Enough Exchange for Systems Biology Data and Models Carole Goble, Katy Wolstencroft, Stuart Owen, Sergejs Aleksejevs - University of Manchester

www.sysmo-db.orgEnd + questons

Page 43: SysMO-DB: Just Enough Exchange for Systems Biology Data and Models Carole Goble, Katy Wolstencroft, Stuart Owen, Sergejs Aleksejevs - University of Manchester

END