Upload
mike-hucka
View
1.144
Download
2
Embed Size (px)
DESCRIPTION
Slides from presentation given on November 21, 2011, at the 4th Global COE International Symposium on Physiome and Systems Biology for Integrated Life Sciences and Predictive Medicine, in Osaka, Japan.
Citation preview
SBML and related resources and standardization efforts
Michael HuckaMember of the Professional Staff
Computing + Mathematical SciencesCalifornia Institute of Technology
1
Cogitation
Research today: experimentation, modeling, cogitation2
3
Different tools ⇒ different interfaces & languages
4
SBML
5
Format for representing computational models
• Data structures + rules for their use + serialization to XML
Neutral with respect to modeling framework
• E.g., ODE, stochastic systems, etc.
A lingua franca for software (not humans)
SBML = Systems Biology Markup Language
6
The reaction is central: a process occurring at a given rate
• Participants are pools of entities (species)
Models can further include:
• Other constants & variables
• Compartments
• Explicit math
• Discontinuous events
Basic SBML concepts are fairly simple
naA + nbBf([A],[B],[P ],...)������������⇥npP
ncCf(...)���⇥ ndD + neE + nfF
...
• Unit definitions
• Annotations
7
Example of a common type of model
Tyson et al. (1991) PNAS 88(1):7328–32
Simulationoutput
8
Scope of SBML encompasses many types of models
Signaling pathway models Fernandez et al. (2006)
DARPP-32 Is a Robust Integrator of Dopamine and Glutamate Signals
PLoS Computational Biology
BioModels Database model#BIOMD0000000153
9
Scope of SBML encompasses many types of models
Signaling pathway models
Conductance-based models
• “Rate rules” for temporal evolution of quantitative parameters
Hodgkin & Huxley (1952)
A quantitative description of membrane current and its application to conduction and excitation in nerve
J. Physiology 117:500–544
BioModels Database model#BIOMD0000000020
10
Scope of SBML encompasses many types of models
Signaling pathway models
Conductance-based models
• “Rate rules” for temporal evolution of quantitative parameters
Neural models
• “Events” for discontinuous changesin quantitative parameters
Izhikevich EM. (2003)
Simple model of spiking neurons.
IEEE Trans Neural Net.
BioModels Database model#BIOMD0000000127
11
Scope of SBML encompasses many types of models
Signaling pathway models
Conductance-based models
• “Rate rules” for temporal evolution of quantitative parameters
Neural models
• “Events” for discontinuous changesin quantitative parameters
Pharmacokinetic/dynamics models
• “Species” is not required to be abiochemical entity
Tham et al. (2008)
A pharmacodynamic model for the time course of tumor shrinkage by gemcitabine + carboplatin in non-small cell lung cancer patients
Clin. Cancer Res. 14
BioModels Database model#BIOMD0000000234
12
Scope of SBML encompasses many types of models
Signaling pathway models
Conductance-based models
• “Rate rules” for temporal evolution of quantitative parameters
Neural models
• “Events” for discontinuous changesin quantitative parameters
Pharmacokinetic/dynamics models
• “Species” is not required to be abiochemical entity
Infectious diseases
Munz et al. (2009 )
When zombies attack!: Mathematical modelling of an outbreak of zombie infection
Infectious Disease Modelling Research Progress, eds. Tchuenche et al., p. 133–150
BioModels Database model#MODEL1008060001
13
SBML Level 1 SBML Level 2 SBML Level 3
predefined math functions user-defined functions user-defined functions
text-string math notation MathML subset MathML subset
reserved namespaces for annotations
no reserved namespaces for annotations
no reserved namespaces for annotations
no controlled annotation scheme
RDF-based controlled annotation scheme
RDF-based controlled annotation scheme
no discrete events discrete events discrete events
default values defined default values defined no default values
monolithic monolithic modular
14
SBML Level 3: Supporting more categories of models
A package adds constructs & capabilities
Models declare which packages they use
• Applications tell users which packages they support
Package development can be decoupled
SBML Level 3 Core
Package X Package Y Package Z
Package W
(dependencies)
15
Level 3 package Active?Preliminary libSBML 5
plug-in available?
Graph layout
Groups
Hierarchical composition
Flux balance constraints
Spatial
Multicomponent species
Annotations
Graph rendering
Distribution & ranges
Qualitative models
Dynamic structures
Arrays & sets
✓✓✓✓✓✓
✓✓
✓
✓
16
Level 3 package Active?Preliminary libSBML 5
plug-in available?
Graph layout
Groups
Hierarchical composition
Flux balance constraints
Spatial
Multicomponent species
Annotations
Graph rendering
Distribution & ranges
Qualitative models
Dynamic structures
Arrays & sets
✓✓✓✓✓✓
✓✓
✓
✓
Models composed of submodels
16
Level 3 package Active?Preliminary libSBML 5
plug-in available?
Graph layout
Groups
Hierarchical composition
Flux balance constraints
Spatial
Multicomponent species
Annotations
Graph rendering
Distribution & ranges
Qualitative models
Dynamic structures
Arrays & sets
✓✓✓✓✓✓
✓✓
✓
✓
2-D and 3-D spatial geometries and spatial processes
16
Hierarchical model composition
17
Goal of supporting model composition is not new
Modular SBML
MAX!PLANCK!INSTITUT
TECHNISCHER SYSTEMEMAGDEBURG
DYNAMIK KOMPLEXER
Martin GinkelMax-Planck-Institute for Dynamics of complex technical Systems
Magdeburg, Germany
5th July 2002
The Systems Biology Markup Language (SBML) [1-3] is a computer-readable format for representing models of
biochemical reaction networks. It is applicable to many subject areas:
• metabolic networks, • cell-signaling pathways, • genomic regulatory networks, and • many other modelling problems in systems biology.
SBML is based on XML, a standard medium for representing and transporting data that is widely supported on the Internetas well as in computational biology and bioinformatics.
Because SBML is completely tool-independent, it enables • use of multiple simulation and analysis tools in a single research project without rewriting models for each tool • publication of models in peer-reviewed journals: other researchers can download and use your model even if they use a different modelling environment • survival of models: they can outlive the software used to create them, making your work still useful even if a particular simulation package is no longer supportedSBML has been evolving since mid-2000 through the efforts of many collaborators who make up the SBML Forum. Today,SBML is supported by over 60 software applications
[1] M. Hucka et al., The systems biology markup language (SBML): a medium for representation and exchange of biochemical networkmodels, Bioinformatics, Vol 19, 524-531[2] A. Finney and M. Hucka, Systems Biology Markup Language: Level 2 and Beyond, Biochem. Soc. Trans., Vol 31, 1472-1473[3] M. Hucka et al., Evolving a Lingua Franca and Associated Software Infrastructure for Computational Systems Biology: The SystemsBiology Markup Language (SBML) Project, Systems Biology, Vol 1, 41-53[4] M. Ginkel, Modular SBML, Proposal for an Extension of SBML towards level 2 Proceedings of the 5th Workshop on Software Platforms for Systems Biology, http://sbml.org/workshops/fifth/sbml-modular.pdf[5] J. Webb, BioSpice MDL Model Composition and Libraries http://bio.bbn.com/biospice/mdl/design/compose.html[6] A. Finney, Systems Biology Markup Language (SBML) Level 3 Proposal: Model Composition Featureshttp://www.cds.caltech.edu/~afinney/model-composition.pdf[7] H. Jˆnnson et al., Signalling in multicellular models of plant development, Proceedings of the 3rd International Conference on Systems Biology [8] A. Finney, Systems Biology Markup Language (SBML) Level 3 Proposal: Array Features, http://www.cds.caltech.edu/~afinney/arrays.pdf[9] A. Finney, Systems Biology Markup Language (SBML) Level 3 Proposal: Multicomponent Species Features,http://www.cds.caltech.edu/~afinney/multi-component-species.pdf
Support for the development of SBML and associated software and activities comes from the National Human GenomeResearch Institute (USA), the National Institute of General Medical Sciences (USA), the International Joint Research Programof NEDO (Japan), the ERATO-SORST Program of the Japan Science and Technology Agency (Japan), the Ministry ofAgriculture (Japan), the Ministry of Education, Culture, Sports, Science and Technology (Japan), the BBSRC e-ScienceInitiative (UK), the DARPA IPTO Bio-Computation Program (USA), and the Air Force Office of Scientific Research (USA).
As SBML evolves the community creates SBML Levels. Each new level adds new features to the language. SBML Level 2 wasstandardized in 2003. Simple software tools can use SBML Level 1, the first and most basic version of SBML. Moresophisticated systems can use SBML Level 2, with its enhanced capabilities. SBML Level 3 is actively being developedthrough the SBML Forum
In SBML Level 2 VSHFLHV�represents a pool of chemical entities all of the same single state in a specific compartment.VSHFLHV�cannot be composed from components. Given that several groups find this representation of species limited, aproposal for a multicomponent species extension to SBML has been written [9]. This proposal aims to satisfy the followingrequirements:
• Relate species of the same type that are located in different compartments • Enable reactions to defined that are generalized across compartments • Enable species to be defined as composed of components • Enable reactions to be generalized to apply to sets of species states
These requirements address the near-term needs of modellers of metabolic networks and the longer-term requirements ofmodellers of signal transduction networks.
In this section we show examples of two ways in which a reaction can be defined under this proposal. The following diagramshows an example of the first approach. The diagram shows a simple reaction in which two entities of types t and z areconsumed to create an entity of type s. The internal structure of t, z and s are not relevant to the reaction.
The following diagram shows the second more complex approach in which the reactants and products of a reaction aredefined as graphs of species instances. The diagram shows a reaction in which two entities come together to form a largermolecule. The instances of species types are identified so that the transformational details of the reaction are captured.
The complex reaction scheme described above is extended so that reactions can be applied to a class of species states ratherthan individual species states. Without this extension, all species states and the reactions that apply to them would have to beenumerated. A reaction can be generalized to cover all states of one or more binding sites. In the following example diagram,species type y has 2 binding sites C and D. This reaction shows that an entity t of type v binds to an entity s of type yirrespective of the state of the C binding site on s. The state of the C binding site on s is captured by the variable G which ismapped from the reactants to the product.
t z s++
v
A
w
B ++0 pv
A
w
B0 p
v
A
y
D ++s tv
A
y
Ds t
C
00 00
00 00CG G
The proposal described here [8] introduces a number of basic facilities that overcome some of the limitations of SBML Level 2and provide a foundation for a representation scheme that address all the requirements for a multicomponent speciesproposal.
The proposal introduces a new structure 6SHFLHV7\SH�which represents the set of all biochemical entities of a given typeirrespective of the location of those entities. Species structures can refer to species types which enables species of the sametype to be related together when the given species are located in different compartments. Similarly reactions can begeneralized to apply to species types instead of species. Such a reaction applies to all compartments in a model.
Arbitrary Subgraph
When composing a model, it is often necessary to merge objects from different submodels. The model compositionproposals provide mechanism for doing this. Consider the following model, without interfaces, containing twoinstances each of a different submodel. In this example, we merge species g with h and i with f:
d
g
e
Instance A
f
j
h
Instance B i
Reaction Link
d
g
e
Instance A
f
j
h
Instance Bi
The following model is equivalent but has defined interfaces:
D
E
G
H
F
J
SpeciesPort
Along with merging equivalent entities form a single object, when combining models it is useful to be able to createreactions that link models. The model composition proposals allow reactions to connect species in different instancesof submodels. For example, consider the following model containing a reaction between two ports:
a b c dQP
Instance X Instance Y
SBML Level 3 is being designed collaboratively by today's leading developers of open-source software forcomputational biology. SBML Level 3 development has been divided into several modules including:
• Diagrams: SBML extensions to store the graphical diagrams of models that can be created in many of today's graphical pathway editors.
• Model Composition: SBML extensions to support the representation of models that are composed from submodels (See Sections 'Proposals for Model Composition' and 'Model Composition Example').
• Multicomponent Species: SBML extensions to enable the compact representation of species having multiple possible states (e.g., due to phosphorylation) and/or configurations with other species (e.g., protein complexes). (See section 'Requirements of a Multicomponent Species Proposal' and following sections.)
• Arrays: SBML data structures to permit arrays of items (such as species, compartments, and others) to be grouped and manipulated en masse. Sparse arrays will be supported and could be used as a way to describe network connection schemes. (See Section 'Array Proposal').
• Spatial Features: SBML extensions to describe the 2-D and 3-D spatial characteristics of models, including the geometry of compartments, the diffusion properties of species, and the specification of different species concentrations across different regions of a cell.
• Controlled Vocabularies: extension of SBML to enable components of a model to be labelled with terms taken from biologically and computationally meaningful controlled vocabularies.
To date, there have been several proposals for SBML extensions to support model composition. These come from MartinGinkel (MPI Magdeburg) [4], Jonathan Webb (BBN) [5] and Andrew Finney [6]. The common idea is to support thecomposition of larger models from smaller ones (submodels). Under these proposals, a model could contain:
• Submodel definitions: Models may be contained within an SBML document or an SBML document can reference external models.
• Instances of submodels: Models may contain instances of submodels that are complete copies of the submodels. A model can contain more than one instance of a submodel. A model consists of a hierarchy of instances of submodels.
• Links between objects: Models may contain links between objects at arbitrary positions in the instance hierarchy. Such a link indicates that the linked objects are replaced by a single object. The links are directional; the direction indicates which object overloads its attribute values to create the final object.
• Direct Reference links: SBML attributes that reference other objects, for example VSHFLHV�on VSHFLHV5HIHUHQFH can be replaced by elements which enable objects in arbitrary positions in the instance hierarchy to be referenced.
The following diagrams show various cases of how a species type may be defined. Some of thesespecies type structures refer to each other.
A simple species type is indivisible
A species type can define a number ofexternal labelled binding sites
A species type is a graph of species typeinstances connected by bonds
t
species type instance
v
A
y vx
q pABC
00
unoccupiedbinding site
bond species typeinstanceidentifier
species typeidentifier
Some types of model use indexed collections of objects to describe biological phenomena [7]. We have developed a proposalfor an array extension to address this requirement [8] which has the following features:
• Arrays of 6SHFLHV, 3DUDPHWHU, 5XOH, 5HDFWLRQ, &RPSDUWPHQW�structures can be created. These arrays can have any number of dimensions where the range of each dimension is determined by two MathML integer expressions.
• An object of one of these types can have an H[LVWV�MathML expression which defines whether the object exists. This enables the definition of sparse arrays which turn provides a mechanism for defining connection patterns among array elements.
• Specific objects within an array can be referenced from other objects using a variant of the direct link structure introduced by the model composition proposal. An array selector operator performs a similar function in MathML.
• Arrays can be declared in a less verbose form (the implied form) which allows the array to 'inherit' dimensions from other arrays.
• Arrays of ,QVWDQFH�and /LQN�structures introduced by the Model Composition Proposal can be incorporated if required. This would allow for example the encoding of a model of tissue represented as an array of instances of cell submodels.
CellML has always had capability
Martin Ginkel & Jörg Stelling made proposals mid-2001, 2002
• Influenced by ProMoT/DIVA
Jonathan Webb also made a proposal in 2003
• Context of Bio-SPICE project
Andrew Finney made alternate proposal in 2003, kept up discussions through 2004
SBML efforts stalled in ‘05–’06 ...
Lucian Smith & Mike Hucka restarted effort in ’10
18
Composition as it is currently envisioned
Goals:
• Separate concepts of model definition vs instantiation of the model
- Can define single model definition & instantiate multiple copies
- Can create model libraries
• Selective replacement and/or deletion of entities
• Optional explicit interfaces (“ports”)
Latest proposal:
• http://www.sbml.org/Community/Wiki
• Preliminary implementation for libSBML is nearly ready
19
Scenario #1
File “X”
<sbml>
<model>
Model definition “A”
Submodel “B”
Pointer to def. “A”
Submodel “C”
Pointer to def. “A”
Single submodel template instantiated multiple times in the enclosing model
20
Arbitrary nesting—model instantiates another model definition that itself instantiates another model definition
Scenario #2
File “X”
<sbml>
<model>
Model definition “C”
Model definition “B”
Submodel “Z”
Pointer to def. “B”
Submodel “A”
Pointer to def. “C”
21
Models in external files
Scenario #3
File “X”<sbml>
<model>
Submodel “Z”
Pointer to def. “B”
External model definition “B”
File “Y”
<model>
22
Links/references/replacements
Model “outer”
S1 S2
Compartment “c”
Compartment “q”
Model “inner”
X1 X2
Model “outer”
Compartment “c”
S1 S2 X2 (from “inner”)
Implied model
23
Interfaces/ports
Model “outer”
S1 S2
Compartment “c”
Compartment “q”
Model “inner”
X1 X2
24
Spatial geometry
25
The problem
Core SBML only supports compartments containing well-stirred mixtures
• Lack support for defining geometric shape of compartments
• Lack support for nonuniform molecular distributions
• Lack support for expressing diffusion processes
The only way to do it portably in SBML is to fake it
• E.g., define a large number of small compartments...
26
The current proposal
Main components:
• Coordinate systems
• Patches of spatial geometries, called domains
- Domain = contiguous patch of volumetric space or surface patch
• Mapping of SBML compartments, species, & parameters to domains
• Molecular transport mechanisms (e.g., advection, diffusion)
• Mapping of molecular transport mechanisms to domains
Developed & implemented by Jim Schaff of the Virtual Cell group
• (Incomplete) proposal doc at http://www.sbml.org/Community/Wiki
• Beta test implementation for libSBML available today
27
Supports multiple alternatives for defining geometries
1. Analytic
2. Sampled field
3. Constructive solid geometry
4. Parametric shapes
28
— additional extensions ...
— additional extensions ...
29
Where to learn more
30
Where to learn more: SBML.org—the SBML portal
31
Find SBML software
Where to learn more: SBML.org—the SBML portal
31
Where to find curated, ready-to-run models
BioModels Databasehttp://biomodels.net/biomodels
32
Features of BioModels Database
Stores & serves quantitative models of biological interest
• Free, public resource
• Models must be described in peer-reviewed publication(s)
All models are curated by hand to reproduce published results
Imports & exports models in several formats
• SBML, CellML, SciLab, XPP, BioPAX
Today: 750+ models
Developed by Nicolas Le Novère’s group (EBI), funded by EBI & NIH
33
There’s more to modeling than SBML34
Representationformat
Model Procedures Results
Minimal inforequirements
Semantics—
Mathematical
Biological
SBRML
?
annotations annotations annotations
35
Representationformat
Model Procedures Results
Minimal inforequirements
Semantics—
Mathematical
Biological
SBRML
?
annotations annotations annotations
35
Annotations can answer questions:
• “What other identities (synonyms) does this entity have?”
• “What exactly is the process represented by equation ‘r17’?”
• “What role does constant ‘k3’ play in equation ‘r17’?”
• “What organism are we talking about?”
• ... etc. ...
Multiple annotations on same entity are common
Annotations add semantics and connections
36
Le Novère et al., Nature Biotech., 23(12), 2005.
37
Element in the model
Entity elsewhere (e.g., in a database)
relationship qualifier(optional)
MIRIAM cross-references are simple triples38
Annotations permit inter-database linking
39
Annotations permit inter-database linking
39
Annotations permit other capabilities
http://www.semanticsbml.org
40
MIRIAM cross-references are simple triples
Data source identifier
Data item identifier
Annotation qualifier{ }
(Required) (Required) (Optional)
URI chosen from agreed-upon list
Syntax & value space depends on data type
Format:
Controlled vocabulary term
Element in the model
Entity elsewhere (e.g., in a database)
relationship qualifier(optional)
41
http://www.ebi.ac.uk/miriam
MIRIAM Registry provides URI dictionary & resolver
Community-maintained
42
http://www.ebi.ac.uk/miriam
MIRIAM Registry provides URI dictionary & resolver
Community-maintained
42
New development: identifiers.org
Provides resolvable persistent URIs
• Unlike URNs, you can type it in a web browser
Implemented as additional layer on top of MIRIAM Registry
• Provides persistent URLs to data sources
• References data are kept in MIRIAM Registry
Example:
• EC Code entry #1.1.1.1
- MIRIAM URN: urn:miriam:ec-‐code:1.1.1
- identifiers.org URI: http://identifiers.org/ec-‐code/1.1.1.1
Developed by Nicolas Le Novère, Camille Laibe, Nick Juty @ EBI
43
Mathematical
Biological
Graphical
Discrete stochastic entities
Continuous lumped parameter
State transition
Mean field approximation
Model type
Model crea
tion
Model annotati
on
Model analys
is
Numeric
al resu
lts
Model life-cycle
Model representation level
Conc
ept d
ue to
Nic
olas
Le
Nov
ère
Other forms of representation44
Graphical representation of models
Today: broad variation in graphical notation used in biological diagrams
• Between authors, between journals, even people in same group
However, standard notations (as used in engineering) would offer benefits:
• Consistency = easier to read diagrams with less ambiguity
• Software support: verification of correctness, translation to math
45
SBGN = Systems Biology Graphical Notation
Goal: standardize the graphical notation in diagrams of biological processes
• Community-based development, à la SBML
Many groups participating
• Proceeding in “levels”
• 23 software tools so far
http://sbgn.org
46
National Institute of General Medical Sciences (USA)
European Molecular Biology Laboratory (EMBL)ELIXIR (UK)
Beckman Institute, Caltech (USA)
Keio University (Japan)
JST ERATO Kitano Symbiotic Systems Project (Japan) (to 2003)
JST ERATO-SORST Program (Japan)
International Joint Research Program of NEDO (Japan)
Japanese Ministry of Agriculture
Japanese Ministry of Educ., Culture, Sports, Science and Tech.
BBSRC (UK)
National Science Foundation (USA)
DARPA IPTO Bio-SPICE Bio-Computation Program (USA)
Air Force Office of Scientific Research (USA)
STRI, University of Hertfordshire (UK)
Molecular Sciences Institute (USA)
Agencies to thank for supporting SBML & BioModels.net
47
People on SBML Team & BioModels Team
SBML Team BioModels.net TeamMichael Hucka Nicolas Le NovèreSarah Keating Camille Laibe
Frank Bergmann Nicolas RodriguezLucian Smith Nick Juty
Nicolas Rodriguez Vijayalakshmi ChelliahLinda Taddeo Stuart MoodieAkiya Joukarou Sarah KeatingAkira Funahashi Maciej Swat
Kimberley Begley Lukas EndlerBruce Shapiro Chen LiAndrew Finney Harish DharuriBen Bornstein Lu Li
Ben Kovitz Enuo HeHamid Bolouri Mélanie CourtotHerbert Sauro Alexander BroicherJo Matthews Arnaud Henry
Maria Schilstra Marco Donizelli
VisionariesHiroaki Kitano
John Doyle
48
A huge thank you to the community
Attendees at SBML 10th Anniversary Symposium, Edinburgh, 2010
49
URLs
SBML http://sbml.org
BioModels Database http://biomodels.net/biomodels
MIRIAM http://biomodels.net/miriam
MIASE http://biomodels.net/miase
SED-ML http://biomodels.net/sed-ml
SBO http://biomodels.net/sbo
KiSAO http://www.ebi.ac.uk/compneur-srv/kisao/
TEDDY http://www.ebi.ac.uk/compneur-srv/teddy/
SBRML http://tinyurl.com/sbrml
SBGN http://sbgn.org
50
Representationformat
Model Procedures Results
Minimal inforequirements
Semantics—
Mathematical
Biological
SBRML
?
annotations annotations annotations
51
Representationformat
Model Procedures Results
Minimal inforequirements
Semantics—
Mathematical
Biological
SBRML
?
annotations annotations annotations
51
SED-ML = Simulation Experiment Description ML
Application-independent format
Captures procedures, algorithms, parameter values
• Steps to go from model to output
libSedML project developing API library
<sbml ...> ... <listOfCompartments> <compartment id="cell" size="1e-15" /> </listOfCompartments> <listOfSpecies> <species compartment="cell" id="S1" initialAmount="1000" /> <species compartment="cell" id="S2" initialAmount="0" /> <listOfSpecies> <listOfParameters> <parameter id="k" value="0.005" sboTerm="SBO:0000339" /> <listOfParameters> <listOfReactions> <reaction id="r1" reversible="false"> <listOfReactants> <speciesReference species="S1" stoichiometry="2" sboTerm="SBO:0000010" /> ...
?
52