62
Computational Approaches to Systems Biology Michael Hucka, Ph.D. Department of Computing + Mathematical Sciences California Institute of Technology Pasadena, CA, USA The Kinghorn Cancer Centre, Australia, August 2013 Email: [email protected] Twitter: @mhucka

Computational Approaches to Systems Biology

Embed Size (px)

DESCRIPTION

Presentation given at the Sydney Computational Biologists meetup on 21 August 2013 (http://australianbioinformatics.net/past-events/2013/8/21/computational-approaches-to-systems-biology.html).

Citation preview

Page 2: Computational Approaches to Systems Biology

Outli

ne

Background and introduction

The Systems Biology Markup Language (SBML)

Complementary efforts: MIRIAM and SED-ML

COMBINE: the Computational Modeling in Biology Network

Conclusion

Page 3: Computational Approaches to Systems Biology

Outli

ne

Background and introduction

The Systems Biology Markup Language (SBML)

Complementary efforts: MIRIAM and SED-ML

COMBINE: the Computational Modeling in Biology Network

Conclusion

Page 4: Computational Approaches to Systems Biology

Research today: experimentation, computation, cogitation

Page 5: Computational Approaches to Systems Biology

“ The nature of systems biology”Bruggeman & Westerhoff,

Trends Microbiol. 15 (2007).

Page 6: Computational Approaches to Systems Biology

Large-scale integrative models are growing

Page 7: Computational Approaches to Systems Biology

Many models have traditionally been published this way

Problems:

• Errors in printing

• Missing information

• Dependencies onimplementation

• Outright errors

• Can be a hugeeffort to recreate

Is it enough to communicate the model in a paper?

Page 8: Computational Approaches to Systems Biology

Is it enough to make your (software X) code available?It’s vital for good science:

• Someone with access to the same software can try to run it, understand it, verify the computational results, build on them, etc.

• Opinion: you should always do this in any case

Page 9: Computational Approaches to Systems Biology

Is it enough to make your (software X) code available?It’s vital for good science—

• Someone with access to the same software can try to run it, understand it, build on it, etc.

• Opinion: you should always do this in any case

But it’s still not ideal for communication of scientific results:

• Doesn’t necessarily encode biological semantics of the model

• What if they don’t have access to the same software?

• What if they don’t want to use that software?

• What if they want to use a different conceptual framework?

• And how will people be able to relate the model to other work?

Page 10: Computational Approaches to Systems Biology

Different tools ⇒ different interfaces & languages

Page 11: Computational Approaches to Systems Biology

Outli

ne

Background and introduction

The Systems Biology Markup Language (SBML)

Complementary efforts: MIRIAM and SED-ML

COMBINE: the Computational Modeling in Biology Network

Conclusion

Page 12: Computational Approaches to Systems Biology

SBML: a lingua fra

nca

for software

Page 13: Computational Approaches to Systems Biology

Format for representing computational models of biological processes

• Data structures + usage principles + serialization to XML

• (Mostly) Declarative, not procedural—not a scripting language

Neutral with respect to modeling framework

• E.g., ODE, stochastic systems, etc.

Important: software reads/writes SBML, not humans <Beginning of SBML model definition>

List of function definitionsList of unit definitionsList of compartments

List of molecular speciesList of parameters

List of rulesList of reactions

List of events<End of SBML model definition>

SBML = Systems Biology Markup Language

Page 14: Computational Approaches to Systems Biology

The raw SBML (as XML)

Page 15: Computational Approaches to Systems Biology

The process is central

• Literally called a “reaction” in SBML

• Participants are pools of entities (biochemical species)

Models can further include:

• Compartments

• Other constants & variables

• Discontinuous events

• Other, explicit math

Core SBML concepts are fairly simple

• Unit definitions

• Annotations

Page 16: Computational Approaches to Systems Biology

Well-stirred compartments

c

n

Some basics of SBML core model encoding

Page 17: Computational Approaches to Systems Biology

Species pools are located in compartments

c

n

protein A protein B

gene mRNAn mRNAc

Page 18: Computational Approaches to Systems Biology

Reactions can involve any species anywhere

c

n

protein A protein B

gene mRNAn mRNAc

Page 19: Computational Approaches to Systems Biology

Reactions can cross compartment boundaries

c

n

protein A protein B

gene mRNAn mRNAc

Page 20: Computational Approaches to Systems Biology

Reaction/process rates can be (almost) arbitrary formulas

c

n

protein A protein B

gene mRNAn mRNAc

f1(x)

f2(x)

f3(x)f4(x)

f5(x)

Page 21: Computational Approaches to Systems Biology

“Rules”: equations expressing relationships in addition to reaction sys.

c

n

protein A protein B

gene mRNAn mRNAc

f1(x)

f2(x)

f3(x)

g1(x)g2(x)

.

.

.

f4(x)

f5(x)

Page 22: Computational Approaches to Systems Biology

“Events”: discontinuous actions triggered by system conditions

c

n

protein A protein B

gene mRNAn mRNAc

f1(x)

f2(x)

f3(x)

g1(x)g2(x)

.

.

.

Event1: when (...condition...), do (...assignments...)

Event2: when (...condition...), do (...assignments...)

...

f4(x)

f5(x)

Page 23: Computational Approaches to Systems Biology

Annotations: machine-readable semantics and links to other resources

Event1: when (...condition...), do (...assignments...)

Event2: when (...condition...), do (...assignments...)

...

c

n

protein A protein B

gene mRNAn mRNAc

f1(x)

f2(x)

f3(x)

g1(x)g2(x)...

f4(x)

f5(x)

“This event represents ...”

“This is identified by GO id # ...”

“This is an enzymatic reaction with EC # ...”

“This is a transport into the nucleus ...” “This compartment

represents the nucleus ...”

Page 25: Computational Approaches to Systems Biology

Contents of BioModels DatabaseContents today:

• 142,000+ pathway models (converted from KEGG)

• 460+ hand-curated quantitative models

• 460+ non-curated quantitative models

8%2%

3%6%

6%

7%

8%

9%24%

27%

signal transductionmetabolic processmulticelullar organismal processrhythmic processcell cyclehomeostatic processresponse to stimuluscell deathlocalizationothers (e.g., developmental process)

Database data from 2013

Page 26: Computational Approaches to Systems Biology

Find software in the SBML Software Guide

Page 27: Computational Approaches to Systems Biology

Find SBML software

Find software in the SBML Software Guide

Page 28: Computational Approaches to Systems Biology

Question: Which of the following categories best describe your software? (Check all that apply.)

Results of 2011 survey of SBML-compatible software

Out of 81 responses

Simulation software

Analysis s/w (in addition, or instead of, simulation)

Creation/model development software

Visualization/display/formatting software

Utility software (e.g., format conversion)

Data integration and management software

Repository or database

Framework or library (for use in developing s/w)

S/w for interactive env. (e.g., MATLAB, R, ...)

Annotation software0 20 40 60 80

11

13

13

14

16

23

31

31

40

42

Page 29: Computational Approaches to Systems Biology

Some particularly full-featured, general simulation toolsCOPASI: ODE & stochastic simulation, parameter scanning, plotting

Virtual Cell: web-based environment, spatial models

iBioSim: special features for genetic circuit models for synthetic biology

SBW (Systems Biology Workbench): component-based toolkit

SBMLsimulator: Java-based simulator, web-start or stand-alone

CellDesigner: graphical editing, SBGN support, SABIO-RK integration

Page 30: Computational Approaches to Systems Biology

Free software libraries – libSBMLReads, writes, validates SBML

Can check & convert units

Written in portable C++

Runs on Linux, Mac, Windows

APIs for C, C++, C#, Java, Octave, Perl, Python, R, Ruby, MATLAB

Well documented API

Open-source (LGPL)

http://sbml.org/Software/libSBML

Page 31: Computational Approaches to Systems Biology

Evolution of SBML continuesToday: SBML Level 3

• Level 3 Core provides framework for common models

• Level 3 packages add additional constructs to the Core

Page 32: Computational Approaches to Systems Biology

Level 3 package What it enablesHierarchical model composition Models containing submodels ✔

Flux balance constraints Constraint-based models ✔

Qualitative models Petri net models, Boolean models ✔

Graph layout Diagrams of models ✔

Multicomponent/state species Entities w/ structure; also rule-based models draft

Spatial Nonhomogeneous spatial models draft

Graph rendering Diagrams of models draft

Groups Arbitrary grouping of components draft

Distributions Numerical values as statistical distributions in dev

Arrays & sets Arrays or sets of entities in dev

Dynamic structures Creation & destruction of components in dev

Annotations Richer annotation syntax

Status

Page 33: Computational Approaches to Systems Biology

National Institute of General Medical Sciences (USA) European Molecular Biology Laboratory (EMBL)JST ERATO Kitano Symbiotic Systems Project (Japan) (to 2003)JST ERATO-SORST Program (Japan)ELIXIR (UK)Beckman Institute, Caltech (USA)Keio University (Japan)International Joint Research Program of NEDO (Japan)Japanese Ministry of AgricultureJapanese Ministry of Educ., Culture, Sports, Science and Tech.BBSRC (UK)National Science Foundation (USA)DARPA IPTO Bio-SPICE Bio-Computation Program (USA)Air Force Office of Scientific Research (USA)STRI, University of Hertfordshire (UK)Molecular Sciences Institute (USA)

SBML funding sources over the past 13+ years

Page 34: Computational Approaches to Systems Biology

Outli

ne

Background and introduction

The Systems Biology Markup Language (SBML)

Complementary efforts: MIRIAM and SED-ML

COMBINE: the Computational Modeling in Biology Network

Conclusion

Page 35: Computational Approaches to Systems Biology

Modelers want to use their own conventions

Page 36: Computational Approaches to Systems Biology

Modelers want to use their own conventions

No standard identifiers

Page 37: Computational Approaches to Systems Biology

Modelers want to use their own conventions

Low info content

No standard identifiers

Page 38: Computational Approaches to Systems Biology

Raw models alone are insufficient

Need standard schemes for machine-readable annotations

• Identify entities

• Mathematical semantics

• Links to other data resources

• Authorship & pub. info

Modelers want to use their own conventions

Low info content

No standard identifiers

Page 39: Computational Approaches to Systems Biology

Addresses 2 general areas of annotation needs:

MIRIAM is not specific to SBML

MIRIAM (Minimum Information Requested In the Annotation of Models)

Requirements for reference correspondence

Scheme for encoding annotations

Annotations for attributing model creators & sources

Annotations for referring to external

data resources

Page 40: Computational Approaches to Systems Biology

Addresses 2 general areas of annotation needs:

MIRIAM is not specific to SBML

MIRIAM (Minimum Information Requested In the Annotation of Models)

Requirements for reference correspondence

Scheme for encoding annotations

Annotations for attributing model creators & sources

Annotations for referring to external

data resources

Annotations for referring to external

data resources

Page 41: Computational Approaches to Systems Biology

Example of a problem that can be solved with annotations

http://www.ebi.ac.uk/chebi

Low info content

Page 42: Computational Approaches to Systems Biology

Example of a problem that can be solved with annotations

http://www.ebi.ac.uk/chebi

Low info content

Known by different names – do you want to write all of

them into your model?

salicylic acid

Page 43: Computational Approaches to Systems Biology

MIRIAM annotations for external referencesGoal: link model constituents to corresponding entities in bioinformatics resources (e.g., databases, controlled vocabularies)

• Supports:

- Precise identification of model constituents

- Discovery of models that concern the same thing

- Comparison of model constituents between different models

MIRIAM approach avoids putting data content directly in the model

• Instead, it points at external resources that contain the data

Page 44: Computational Approaches to Systems Biology

How do we create globally unique identifiers consistently?Long story short—developed by the Le Novère group at the EBI

• Resource identifiers (URIs) combine 2 parts:

• There’s a registry for namespaces: MIRIAM Registry

- Allows people & software to use same namespace identifiers

• There’s a URI resolution service: MIRIAM Resources & identifiers.org

- Allows people & software to take a given identifier and figure out what it points to

namespace entity identifier{ {

Identifies a dataset Identifies a datumwithin the dataset

Page 45: Computational Approaches to Systems Biology

Another problem: software can’t read figure legends

?

BIOMD0000000319 in BioModels Database

Decroly & Goldbeter, PNAS, 1982

Page 46: Computational Approaches to Systems Biology

SED-ML = Simulation Experiment Description MLApplication-independent format

• Captures procedures, algorithms, parameter values

Can be used for

• Simulation experiments encoding parametrizations & perturbations

• Simulations using more than one model and/or method

• Data manipulations to produce plot(s)

http://sedml.org

Simulation

Model

Task Data generators

Reports

Page 47: Computational Approaches to Systems Biology

Efforts like SED-ML improve reproducibility of publications

Waltemath et al., BMC Sys Bio 5, 2011.

Page 48: Computational Approaches to Systems Biology

Outli

ne

Background and introduction

The Systems Biology Markup Language (SBML)

Complementary efforts: MIRIAM and SED-ML

COMBINE: the Computational Modeling in Biology Network

Conclusion

Page 49: Computational Approaches to Systems Biology

Need interoperable formats, but developing them is not easyNeed people with diverse set of knowledge & skills

• Scientific needs

• Technical implementation skills

• Practical experience

Need manage multiple phases of a standardization effort

• Creation

• Evolution

• Support

Page 50: Computational Approaches to Systems Biology

Need interoperable formats, but developing them is not easyNeed people with diverse set of knowledge & skills

• Scientific needs

• Technical implementation skills

• Practical experience

Need manage multiple phases of a standardization effort

• Creation

• Evolution

• Support} This is just for the specification of the

standards, to say nothing of the necessary software and other infrastructure!

Page 51: Computational Approaches to Systems Biology

Realizations about the state of affairs in late-2000’s

• Many standardization efforts overlapped, but lacked coordination

• Efforts were inventing their own processes from scratch

• Many individual meetings meant more travel for many people

• Limited and fragile funding didn’t support solid, coherent base

COMBINE = Computational Modeling in Biology Network

• Coordinate standards development

• Develop common procedures & tools (but not impose them!)

• Coordinate meetings

• Provide a recognized voice

Motivations for the creation of COMBINE

Page 52: Computational Approaches to Systems Biology

Standardization efforts represented in COMBINE today

BioPAX

Qualifiers

GPML

COMBINE Standards

Associated Standardization Efforts

Related Standardization Efforts

Page 53: Computational Approaches to Systems Biology

COMBINE formats cover many types of models– from Nicolas Le Novère

Page 54: Computational Approaches to Systems Biology

Examples of community organizationTwo main annual meetings, plus ad hoc workshops

• COMBINE meeting: status updates, presentations, outreach

- Next COMBINE: Paris, Sep 16–20, 2013

• HARMONY: Hackathon on Resources for Modeling in Biology

- Software development, interoperability hacking

COMBINE 2012, TorontoCOMBINE 2011, Heidelberg

Page 55: Computational Approaches to Systems Biology

COMBINE is open to all—and COMBINE needs you!

http://co.mbine.org

Current coordinators:

• Nicolas Le Novère, Mike Hucka, Falk Schreiber, Gary Bader

Page 56: Computational Approaches to Systems Biology

Outli

ne

Background and introduction

The Systems Biology Markup Language (SBML)

Complementary efforts: MIRIAM and SED-ML

COMBINE: the Computational Modeling in Biology Network

Conclusion

Page 57: Computational Approaches to Systems Biology

Time it well

• Too early and too late are bad

Start with actual stakeholders

• Address real needs, not perceived ones

Start with small team of dedicated developers

• Can work faster, more focused; also avoids “designed-by-committee”

Engage people constantly, in many ways

• Electronic forums, email, electronic voting, surveys, hackathons

Make the results free and open-source

• Makes people comfortable knowing it will always be available

Be creative about seeking funding

Some things we (maybe?) got right with SBML

Page 58: Computational Approaches to Systems Biology

Not waiting for implementations before freezing specifications

• Sometimes finalized specification before implementations tested it

- Especially bad when we failed to do a good job

‣ E.g., “forward thinking” features, or “elegant” designs

Not formalizing the development process sufficiently

• Especially early in the history, did not have a very open process

Not resolving intellectual property issues from the beginning

• Industrial users ask “who has the right to give any rights to this?”

Some things we certainly got wrong

Page 59: Computational Approaches to Systems Biology

Nicolas Le Novère, Henning Hermjakob, Camille Laibe, Chen Li, Lukas Endler, Nico Rodriguez, Marco Donizelli, Viji Chelliah, Mélanie Courtot, Harish Dharuri

Attendees at SBML 10th Anniversary Symposium, Edinburgh, 2010

John C. Doyle, Hiroaki Kitano

Mike Hucka, Sarah Keating, Frank Bergmann, Lucian Smith, Andrew Finney, Herbert Sauro, Hamid Bolouri, Ben Bornstein, Bruce Shapiro, Akira Funahashi, Akiya Juraku, Ben Kovitz

Original PI’s:

SBML Team:

SBML Editors:

BioModels DB:

Mike Hucka, Nicolas Le Novère, Sarah Keating, Frank Bergmann, Lucian Smith, Chris Myers, Stefan Hoops, Sven Sahle, James Schaff, Darren Wilkinson

And a huge thanks to many others in the COMBINE community

This work was made possible thanks to a great community

Page 60: Computational Approaches to Systems Biology
Page 61: Computational Approaches to Systems Biology

SBML http://sbml.org

BioModels Database http://biomodels.net/biomodels

MIRIAM http://biomodels.net/miriam

identifiers.org http://identifiers.org

SED-ML http://biomodels.net/sed-ml

SBO http://biomodels.net/sbo

SBGN http://sbgn.org

COMBINE http://co.mbine.org

URLs

Page 62: Computational Approaches to Systems Biology

I’d like your feedback!You can use this anonymous form:

http://tinyurl.com/mhuckafeedback