Modeling and Simulation of Genetic Regulatory Systems
paper by Hidde de Jongreviewed by
Ulrich Basters and Christian Hahn
Seminar Bioinformatics - Modelling and Simulation of Genetic Regulatory Systems - Christian Hahn, Ulrich Basters, 08/27/2003
2/41
0.1 Overview
Introduction Directed and undirected graphs Bayesian networks Boolean networks Generalized logical networks Non-linear ordinary differential equations Piecewise linear differential equations Qualitative differential equations Partial differential equations Stochastic master equations Rule based formalisms Conclusion
Seminar Bioinformatics - Modelling and Simulation of Genetic Regulatory Systems - Christian Hahn, Ulrich Basters, 08/27/2003
3/41
1.1 Genetic Regulatory Systems
In order to understand the functioning of organisms on the molecular level, we need to know which genes are expressed, when and where in the organism, and to which extend.
The regulation of gene expression is achieved through genetic regulatory systems, structured by networks of interactions between DNA, RNA, proteins, and small molecules.
Intuitive understanding of whole dynamic is hard to obtain Consequence: formal methods and computer tools for
modeling and simulating might be an approach
Seminar Bioinformatics - Modelling and Simulation of Genetic Regulatory Systems - Christian Hahn, Ulrich Basters, 08/27/2003
4/41
1.2 Genetic Regulatory Systems
Genes have influence on each other as they produce proteins that work as promoters or repressors on other genes.
Complex system where different concentrations of an agent trigger different actions
Seminar Bioinformatics - Modelling and Simulation of Genetic Regulatory Systems - Christian Hahn, Ulrich Basters, 08/27/2003
5/41
1.3 Motivation
Genetic regulatory systems hard to understand in whole complexity
GOAL: Complexity reduction by appropriate models and formalisms
Better understanding of GRSes Intuitive visualization of GRSes Better analysis of GRSes Models can give hints where to continue research for
dependencies Models point out important parts of the system Gaining understanding of emergence of complex patterns
of behavior from interactions between genes in a Regulatory Network
Seminar Bioinformatics - Modelling and Simulation of Genetic Regulatory Systems - Christian Hahn, Ulrich Basters, 08/27/2003
6/41
1.4 Modeling Life-Cycle
Process model refining the development of a technique that models a GRN:
Seminar Bioinformatics - Modelling and Simulation of Genetic Regulatory Systems - Christian Hahn, Ulrich Basters, 08/27/2003
7/41
2.1 Directed and undirected Graphs – Motivation/Definition
Probably most straightforward way to model a GRN G=<V,E> V set of vertices Set of edges E=<i,j> where i,j є V, head and tail of edge Additional labels denote positive/negative influence
Seminar Bioinformatics - Modelling and Simulation of Genetic Regulatory Systems - Christian Hahn, Ulrich Basters, 08/27/2003
8/41
2.2 Directed and undirected Graphs - Summary
Advantages: Intuitive way of visualization Common and well explored graph algorithms can make
biologically relevant predictions about GRSes: paths between genes may reveal missing regulatory
interactions or provide clues about redundancy cycles in the network point at feedback relations connectivity characteristics give indication of the complexity loosely connected subgraphs point at functional modules
Disadvantages: Time does not play a role Too much abstraction: very simplified model far from
reality
Seminar Bioinformatics - Modelling and Simulation of Genetic Regulatory Systems - Christian Hahn, Ulrich Basters, 08/27/2003
9/41
3.1 Bayesian Networks - Definition
Directed acyclic graph G=<V,E> Vertices 1≤i≤n, iєV represent genes or other elements.
Correspond to random variables Xi
Xi conditional distribution p(Xi | parents(Xi)), where parents(Xi) denotes direct regulators
Conditional Independency: i(Xi; Y | Z) expresses fact that Xi is independent of Y given Z
Seminar Bioinformatics - Modelling and Simulation of Genetic Regulatory Systems - Christian Hahn, Ulrich Basters, 08/27/2003
10/41
3.2 Bayesian Networks – Markov Assumption
Graph encodes Markov assumption, stating that for every gene i in G the conditional independency holds
Method is used to analyse dependencies between genes, not applicable for a system-simulation
Techniques rely on a matching score to evaluate networks and search for the network with optimal score
Graphs are said to be equivalent, if they imply the same set of independencies thus forming an equivalence class (useful for determining important subgraphs)
Looking at Markov and order relations between pairs of genes may point to a relationship between the genes
Seminar Bioinformatics - Modelling and Simulation of Genetic Regulatory Systems - Christian Hahn, Ulrich Basters, 08/27/2003
11/41
3.3 Bayesian Networks – Summary
Advantages Attractive because of solid basics in statistics (enables to
deal with stochastic aspects and noisy measurements in a natural way)
Applicable also if incomplete knowledge about the system is available.
Shows up important parts of the system – usually only a few genes play an important role in large systems
Disadvantages Incomplete knowledge under-determines the network (at
best a few dozen experiments provide information on transcription of thousands of genes)
Search is known as NP-hard. Heuristics are used but they do not guarantee to find a globally optimal solution
Static network – leaves out dynamic aspects → fixed by Dynamic Bayesian Networks
Seminar Bioinformatics - Modelling and Simulation of Genetic Regulatory Systems - Christian Hahn, Ulrich Basters, 08/27/2003
12/41
4.1 Boolean Networks - Definition
State of a gene can be expressed by boolean variable expressing that it is active (=1) or inactive (=0)
Interactions between genes can be represented by boolean functions calculating the state of a gene from activation of other genes
Results in a Boolean Network:
Seminar Bioinformatics - Modelling and Simulation of Genetic Regulatory Systems - Christian Hahn, Ulrich Basters, 08/27/2003
13/41
4.2 Boolean Networks – Definition/Properties
Method is similar to circuitry n-vector of variables in a Boolean Network represents
the state of a regulatory system of n elements, each has value 0 or 1
So system consists of 2n states State of an element at timepoint t+1 computed by
boolean function or rule the state of k of the n elements at time point t
maps k inputs to an output value
Seminar Bioinformatics - Modelling and Simulation of Genetic Regulatory Systems - Christian Hahn, Ulrich Basters, 08/27/2003
14/41
4.3 Boolean Networks – Properties
Transitions between states are deterministic and synchronous (outputs of elements are updated simultaneously)
Sequence of states forms a trajectory of the system A trajectory will either reach a steady state (point
attractor) or a state cycle (dynamic attractor) as number of states is finite
Seminar Bioinformatics - Modelling and Simulation of Genetic Regulatory Systems - Christian Hahn, Ulrich Basters, 08/27/2003
15/41
4.4 Boolean Networks – Summary
Advantages Efficient analysis of large RN Positive/negative feedback-cycles can be modeled with
BN‘s
Disadvantages Strong simplifying assumptions – gene is either on or off,
no in between states Transitions assumed to occur synchronously – not usually
the case, so certain behaviors may be not predicted by simulation algorithm
There are situations where boolean idealisation is not appropriate – more general methods required
Seminar Bioinformatics - Modelling and Simulation of Genetic Regulatory Systems - Christian Hahn, Ulrich Basters, 08/27/2003
16/41
5.1 Generalized Logical Networks (GLN) – Definition
Generalizes Boolean Networks – allows variables to have more than 2 values
Transitions between states occur asynchronously Discrete, so called logical variables being abstractions of
real concentration values xi
Possible values of of element i defined by thresholds of influence on other elements – if element has influence on p other elements it may have p different thresholds
Seminar Bioinformatics - Modelling and Simulation of Genetic Regulatory Systems - Christian Hahn, Ulrich Basters, 08/27/2003
17/41
5.2 GLN – Definition
Formally: If an element i influences p other elements, then it will
have p distinct thresholds
has the possible values {0,...,p} and is defined by:
The vector denotes the logical state of the RN
Seminar Bioinformatics - Modelling and Simulation of Genetic Regulatory Systems - Christian Hahn, Ulrich Basters, 08/27/2003
18/41
5.3 GLN – Definition
The pattern of an interaction is described by logical equations of the form:
is called the image of , which denotes the value towards which tends when the logical state is
Positive and negative feedback-loops are possible to model
Refinement of simple on/off variables in Boolean Networks
Seminar Bioinformatics - Modelling and Simulation of Genetic Regulatory Systems - Christian Hahn, Ulrich Basters, 08/27/2003
19/41
5.4 GLN – Properties
A logical steady state occurs, when the logical state equals its image:
Since the number of logical states is finite, one can test for logical steady states, other states are called transient logical states
If the system is in a transient logical state, it will make a transition into another logical state
Since a logical variable will move into the direction of its image, the successor states can be deduced by comparing the value of a logical variable with that of its image
The logical states and transitions among them can be organized in a state transition graph
Analyses of state transitions, time delays, translation and transport can be taken into account
Improves standard Boolean Network model
Seminar Bioinformatics - Modelling and Simulation of Genetic Regulatory Systems - Christian Hahn, Ulrich Basters, 08/27/2003
20/41
6.1 Nonlinear Ordinary Differential Equations - Definition
Models the concentration of RNA, proteins and other molecules by time-dependant variables
Gene regulation is modeled by rate equations, expressing the rate of production of a component as function of the concentrations of other components
Rate expressions have the following form:
where x = [x1 , ... , xn] ≥ 0 denotes the vector of concentrations and ƒi: Rn → R a usually non-linear function
Discrete time delays τi1, ... , τin > 0 can also be represented:
Seminar Bioinformatics - Modelling and Simulation of Genetic Regulatory Systems - Christian Hahn, Ulrich Basters, 08/27/2003
21/41
6.2 ODE - Definition
Goal: Specifying function ƒi
k1,n, k2,1, ... , kn,n-1 > 0 are production constants and gamma are degradation constants
The rate expression express a balance between the number of molecules appearing, disappearing per unit time
For x1, a regulation function r: R → R is involved whereas the concentration for i > 1 increases linearly in xi-1
An often used regulation function is the so-called Hill curve:
Seminar Bioinformatics - Modelling and Simulation of Genetic Regulatory Systems - Christian Hahn, Ulrich Basters, 08/27/2003
22/41
6.3 ODE - Definition
θj > 0 describes the threshold for the regulatory influence of xj to a target gene
m is stepness parameter The h+-function ranges from 0 to 1 An increase in xj (xj →∞) will tend to increase the expression
rate of a gene (activation), In order to express that an increase of xj will tend to
decrease the expression rate (inhibition), the regulation function is replaced by:
Seminar Bioinformatics - Modelling and Simulation of Genetic Regulatory Systems - Christian Hahn, Ulrich Basters, 08/27/2003
23/41
6.4 ODE Properties
Advantages More „realistic“ way of modeling
Disadvantages Lack of in vivo or in vitro measurements of kinetic
parameters in the rate equations Numerical parameter values are available for only a
handful of well-studied systems (λ-phage) In most cases parameter values had to be chosen such
that the models were able to reproduce observed qualitative behavior
For larger models finding appropriate values may be difficult
Solution Growing availability of data could handle the problem to
some extent
Seminar Bioinformatics - Modelling and Simulation of Genetic Regulatory Systems - Christian Hahn, Ulrich Basters, 08/27/2003
24/41
7.1 Piecewise-Linear Differential Equations (PLDE) - Definition
Special case of rate equation, two simplifications: Interactions by directly relating the expression levels of
genes in the network. Continuous sigmoid curves is approximated by discontinuous
step functions PLDEs have the following form:
Where xi denotes the cellular concentrations of the product of gene i and γ > 0 the degradation rate
The function gi: Rn≥0 → R≥0 is defined as:
where kil > 0 is a rate parameter, bil: R → {0,1} a combination
of step functions bil is arithmetic equivalent of a boolean function,
expressing conditions under which gene is expressed at a rate kil (step function)
Seminar Bioinformatics - Modelling and Simulation of Genetic Regulatory Systems - Christian Hahn, Ulrich Basters, 08/27/2003
25/41
7.2 PLDE - Graphical simplification
Consider an n-dimensional hyperbox defined by:
Assume that for all threshold concentrations θik of the protein encoded by gene i it holds that θik < maxi
The n-1 hyperplanes defined by the thresholds divide the box into orthants
Each orthant of the box reduces to ODEs with a constant production term μi composed of rate parameters in bi:
Seminar Bioinformatics - Modelling and Simulation of Genetic Regulatory Systems - Christian Hahn, Ulrich Basters, 08/27/2003
26/41
7.3 PLDE - Example
State equations corresponding to the orthant 0 ≤ x1 < θ21, θ12 < x2 ≤ max2 and θ33 < x3 ≤ max3
Seminar Bioinformatics - Modelling and Simulation of Genetic Regulatory Systems - Christian Hahn, Ulrich Basters, 08/27/2003
27/41
8.1 Qualitative Differential Equations (QDE) - Definition
Incomplete understanding GRNs and absence of quantitative knowledge → need for qualitative simulation techniques
Idea behind QDE: abstract discrete description from continuous model
Discrete abstraction then used to draw conclusions about the dynamics of the system
QDEs are abstractions of ODEs of the form:
where ƒi: R → R and x take a qualitative value composed of a
qualitative magnitude and direction
Seminar Bioinformatics - Modelling and Simulation of Genetic Regulatory Systems - Christian Hahn, Ulrich Basters, 08/27/2003
28/41
8.2 QDE - Properties
The qualitative magnitude of a variable xi is a discrete abstraction of its real value, the qualitative direction is the sign of its derivate
The function ƒi is abstracted into a set of qualitative constraints
Algorithm (QSIM) generates a tree of qualitative behaviors out of an initial qualitative state consisting of qualitative values
Each behavior in the tree describes a possible sequence of state transitions from the initial state
Every qualitatively distinct behavior of the ODE corresponds to a behavior in the tree generated from the QDE (the reverse may not be true)
Seminar Bioinformatics - Modelling and Simulation of Genetic Regulatory Systems - Christian Hahn, Ulrich Basters, 08/27/2003
29/41
8.3 QDE - Summary
Problems Limited up scalability, behavior trees quickly grow out of
bounds
Solutions Using a simulation algorithm tailored to the equations,
larger networks with complex feedback loops can be treated
Advantages allow weak numerical information Integration of numerical information is more difficult to
achieve in logical approaches
Seminar Bioinformatics - Modelling and Simulation of Genetic Regulatory Systems - Christian Hahn, Ulrich Basters, 08/27/2003
30/41
8.4 QDE – HYPGENE / GENSIM
Qualitative process theory is used for construction and revision of gene regulation models
User definition and knowledge base are used by GENSIM to simulate a proposed experiment
If the predictions do not match, HYPGENE-algorithm generates hypothesis to explain the discrepancies
HYPGENE revises assumptions about the experimental conditions
Helps to refine the model Both algorithms have been able to partially reproduce the
experimental reasoning of the attenuation mechanism regulating the synthesis of tryptophan in E.coli
Seminar Bioinformatics - Modelling and Simulation of Genetic Regulatory Systems - Christian Hahn, Ulrich Basters, 08/27/2003
31/41
9.1 Partial Differential Equations (PDE) – Motivation
Regulatory systems are assumed to be spatially homogenous
Important in certain situations to abstract from these assumptions
Distinguish between different compartments of a cell, for example nucleus and cytoplasm or multiple cells affecting each other
Diffusion of regulatory proteins or metabolites for one compartment to another
This is a critical feature in embryonal development
Seminar Bioinformatics - Modelling and Simulation of Genetic Regulatory Systems - Christian Hahn, Ulrich Basters, 08/27/2003
32/41
9.2 Partial Differential Equations (PDE) – Definition
The reaction-diffusion-equation (for a row of cells):
Can be adapted to other 1- or higher dimensional spacial configurations
If number of cells is large enough, discrete variable l can be replaced by continuous variable λ representing the size of the system
Concentration variables now are defined as functions of l and t and the reaction-diffusion-equations become a partial differential equation (PDE):
Using modes or eigenfunctions of the Laplacian operator gives:
Seminar Bioinformatics - Modelling and Simulation of Genetic Regulatory Systems - Christian Hahn, Ulrich Basters, 08/27/2003
33/41
9.3 Partial Differential Equations (PDE) – Definition
Product of gene 1, the activator, must positively regulate itself; product of gene 2, the inhibitor, must negatively regulate gene 1
Activator-inhibitor-systems were extensively used to study the emergence of segmentation patterns in the early Drosophila embryo
Observed spacial and temporal expression patterns of genes much resemble to the models modes
Numerical simulations demonstrated that some aspects of stripe formation in the Drosophila blastoderm can indeed be reproduced this way
Seminar Bioinformatics - Modelling and Simulation of Genetic Regulatory Systems - Christian Hahn, Ulrich Basters, 08/27/2003
34/41
9.4 Partial Differential Equations (PDE) – Properties
Shown formula still not applicable in all situations, more complex formulas were formed for several special cases
Predictions quite sensitive to the shape of the spacial domain, the boundary conditions and chosen parameter values
Models need to be simple and usually are strong abstractions of biological processes (i.e. only watch at concentrations of a few gene-products)
For larger and more complex models computational costs for finding an optimal fit between data and parameters may be prohibitively high
Seminar Bioinformatics - Modelling and Simulation of Genetic Regulatory Systems - Christian Hahn, Ulrich Basters, 08/27/2003
35/41
10.1 Stochastic Master Equations - Motivation
Differential equations describe gene regulation in great detail
Differential equations presuppose the concentrations of substances continuously and deterministically
Both assumptions are questionable in the case of gene regulation
So, we prefer to use a discrete and stochastic approach Discrete amounts X of molecules are taken as state
variables, joint probability distribution p(X, t) is introduced to express probability that at time t the cell contains X1 molecules of the first species, X2 of the second, etc.
Seminar Bioinformatics - Modelling and Simulation of Genetic Regulatory Systems - Christian Hahn, Ulrich Basters, 08/27/2003
36/41
10.2 Stochastic Master Equations - Definition
The time evolution of p(X, t) can be expressed as:
Where m is number of reactions, αjΔt the probability that reaction j will occur in the
interval [t, t+Δt] given that system is in state X at time t βj Δt the probability that reaction j will bring the system
in state X from another state in [t, t+ Δt]
Rearranging and taking limit Δt → 0 gives the Master equation:
Seminar Bioinformatics - Modelling and Simulation of Genetic Regulatory Systems - Christian Hahn, Ulrich Basters, 08/27/2003
37/41
10.3 Stochastic Master Equations - Properties
Master equations can be approximated by stochastic differential equations
An alternative approach would be to disregard the master equations and directly simulate the time evolution
Based on the stochastic simulation approach Determines when the next reaction occurs and of which
type it will be Revises the state in accordance with this reaction Continuous at the resulting next state
Master equations deal with the behavior averages, stochastic simulation provides information on individual behaviors
Seminar Bioinformatics - Modelling and Simulation of Genetic Regulatory Systems - Christian Hahn, Ulrich Basters, 08/27/2003
38/41
10.4 Stochastic Master Equations - Summary
Advantages Simulation results in closer approximations to the
molecular reality of gene regulation
Disadvantages The use of stochastic simulation is not always evident Requires detailed knowledge Simulation is costly
Seminar Bioinformatics - Modelling and Simulation of Genetic Regulatory Systems - Christian Hahn, Ulrich Basters, 08/27/2003
39/41
11.1 Rule-Based Formalisms (RBF) - Definiton
Knowledge-based or rule-based simulation formalisms, permit rich knowledge about system to express in a single formalism
Consist of two components: facts and rules The rules consist of two parts: condition and action
Advantages Capability to deal with a richer variety of biological knowledge
Disadvantages Difficulties in maintaining a consistent knowledge base RBF cannot compete with former formalisms (quantitatively)
Seminar Bioinformatics - Modelling and Simulation of Genetic Regulatory Systems - Christian Hahn, Ulrich Basters, 08/27/2003
40/41
12.1 Conclusions
Major difficulties in modeling and simulating geneticregulatory networks: Biochemical reaction mechanisms are not known or a
incompletely known Quantitative information and molecular concentration is
only selfdom availableFormalisms discussed allow GRSes to be modeled in quitedifferent ways – depending on application:
Seminar Bioinformatics - Modelling and Simulation of Genetic Regulatory Systems - Christian Hahn, Ulrich Basters, 08/27/2003
41/41
12.2 Expectations
Emergence of new experimental techniques promise to relieve the data bottleneck
Increasing knowledge on molecular mechanisms to model regulatory systems allow a finer level of granularity
The use of quantitative models permits larger systems to be studied at a higher precision
The expectations will bring researchers nearer to the ultimate goal: to use models that integrate gene regulation with metabolism, signal transduction, replication and repair and a variety of other celluar processes
Each of the approaches above has its merits, but neither of them seems sufficient in itself
It can be expected that a combination of the two approaches, exploiting a wide range of structural and functional information on regulatory networks, will be most effective
Seminar Bioinformatics - Modelling and Simulation of Genetic Regulatory Systems - Christian Hahn, Ulrich Basters, 08/27/2003
42/41
13.1 References / Acknowledgements
References (and all images taken from) Hidde de Jong,
Modeling and simulation of genetic regulatory systems: a literature review; J Comput Biol. 2002;9(1):67-103. Review.
Acknowlegements Thanks to Marite Sirava and Thomas Schäfer at ZBI of
Universität des Saarlandes for supporting us to work out this talk