Download ppt - Modeling and Simulation of Genetic Regulatory Systems paper by Hidde de Jong reviewed by Ulrich Basters and Christian Hahn

Modeling and Simulation of Genetic Regulatory Systems

paper by Hidde de Jongreviewed by

Ulrich Basters and Christian Hahn

Seminar Bioinformatics - Modelling and Simulation of Genetic Regulatory Systems - Christian Hahn, Ulrich Basters, 08/27/2003

2/41

0.1 Overview

Introduction Directed and undirected graphs Bayesian networks Boolean networks Generalized logical networks Non-linear ordinary differential equations Piecewise linear differential equations Qualitative differential equations Partial differential equations Stochastic master equations Rule based formalisms Conclusion


3/41

1.1 Genetic Regulatory Systems

In order to understand the functioning of organisms on the molecular level, we need to know which genes are expressed, when and where in the organism, and to which extend.

The regulation of gene expression is achieved through genetic regulatory systems, structured by networks of interactions between DNA, RNA, proteins, and small molecules.

Intuitive understanding of whole dynamic is hard to obtain Consequence: formal methods and computer tools for

modeling and simulating might be an approach


4/41

1.2 Genetic Regulatory Systems

Genes have influence on each other as they produce proteins that work as promoters or repressors on other genes.

Complex system where different concentrations of an agent trigger different actions


5/41

1.3 Motivation

Genetic regulatory systems hard to understand in whole complexity

GOAL: Complexity reduction by appropriate models and formalisms

Better understanding of GRSes Intuitive visualization of GRSes Better analysis of GRSes Models can give hints where to continue research for

dependencies Models point out important parts of the system Gaining understanding of emergence of complex patterns

of behavior from interactions between genes in a Regulatory Network


6/41

1.4 Modeling Life-Cycle

Process model refining the development of a technique that models a GRN:


7/41

2.1 Directed and undirected Graphs – Motivation/Definition

Probably most straightforward way to model a GRN G=<V,E> V set of vertices Set of edges E=<i,j> where i,j є V, head and tail of edge Additional labels denote positive/negative influence


8/41

2.2 Directed and undirected Graphs - Summary

Advantages: Intuitive way of visualization Common and well explored graph algorithms can make

biologically relevant predictions about GRSes: paths between genes may reveal missing regulatory

interactions or provide clues about redundancy cycles in the network point at feedback relations connectivity characteristics give indication of the complexity loosely connected subgraphs point at functional modules

Disadvantages: Time does not play a role Too much abstraction: very simplified model far from

reality


9/41

3.1 Bayesian Networks - Definition

Directed acyclic graph G=<V,E> Vertices 1≤i≤n, iєV represent genes or other elements.

Correspond to random variables Xi

Xi conditional distribution p(Xi | parents(Xi)), where parents(Xi) denotes direct regulators

Conditional Independency: i(Xi; Y | Z) expresses fact that Xi is independent of Y given Z


10/41

3.2 Bayesian Networks – Markov Assumption

Graph encodes Markov assumption, stating that for every gene i in G the conditional independency holds

Method is used to analyse dependencies between genes, not applicable for a system-simulation

Techniques rely on a matching score to evaluate networks and search for the network with optimal score

Graphs are said to be equivalent, if they imply the same set of independencies thus forming an equivalence class (useful for determining important subgraphs)

Looking at Markov and order relations between pairs of genes may point to a relationship between the genes


11/41

3.3 Bayesian Networks – Summary

Advantages Attractive because of solid basics in statistics (enables to

deal with stochastic aspects and noisy measurements in a natural way)

Applicable also if incomplete knowledge about the system is available.

Shows up important parts of the system – usually only a few genes play an important role in large systems

Disadvantages Incomplete knowledge under-determines the network (at

best a few dozen experiments provide information on transcription of thousands of genes)

Search is known as NP-hard. Heuristics are used but they do not guarantee to find a globally optimal solution

Static network – leaves out dynamic aspects → fixed by Dynamic Bayesian Networks


12/41

4.1 Boolean Networks - Definition

State of a gene can be expressed by boolean variable expressing that it is active (=1) or inactive (=0)

Interactions between genes can be represented by boolean functions calculating the state of a gene from activation of other genes

Results in a Boolean Network:


13/41

4.2 Boolean Networks – Definition/Properties

Method is similar to circuitry n-vector of variables in a Boolean Network represents

the state of a regulatory system of n elements, each has value 0 or 1

So system consists of 2n states State of an element at timepoint t+1 computed by

boolean function or rule the state of k of the n elements at time point t

maps k inputs to an output value


14/41

4.3 Boolean Networks – Properties

Transitions between states are deterministic and synchronous (outputs of elements are updated simultaneously)

Sequence of states forms a trajectory of the system A trajectory will either reach a steady state (point

attractor) or a state cycle (dynamic attractor) as number of states is finite


15/41

4.4 Boolean Networks – Summary

Advantages Efficient analysis of large RN Positive/negative feedback-cycles can be modeled with

BN‘s

Disadvantages Strong simplifying assumptions – gene is either on or off,

no in between states Transitions assumed to occur synchronously – not usually

the case, so certain behaviors may be not predicted by simulation algorithm

There are situations where boolean idealisation is not appropriate – more general methods required


16/41

5.1 Generalized Logical Networks (GLN) – Definition

Generalizes Boolean Networks – allows variables to have more than 2 values

Transitions between states occur asynchronously Discrete, so called logical variables being abstractions of

real concentration values xi

Possible values of of element i defined by thresholds of influence on other elements – if element has influence on p other elements it may have p different thresholds


17/41

5.2 GLN – Definition

Formally: If an element i influences p other elements, then it will

have p distinct thresholds

has the possible values {0,...,p} and is defined by:

The vector denotes the logical state of the RN


18/41

5.3 GLN – Definition

The pattern of an interaction is described by logical equations of the form:

is called the image of , which denotes the value towards which tends when the logical state is

Positive and negative feedback-loops are possible to model

Refinement of simple on/off variables in Boolean Networks


19/41

5.4 GLN – Properties

A logical steady state occurs, when the logical state equals its image:

Since the number of logical states is finite, one can test for logical steady states, other states are called transient logical states

If the system is in a transient logical state, it will make a transition into another logical state

Since a logical variable will move into the direction of its image, the successor states can be deduced by comparing the value of a logical variable with that of its image

The logical states and transitions among them can be organized in a state transition graph

Analyses of state transitions, time delays, translation and transport can be taken into account

Improves standard Boolean Network model


20/41

6.1 Nonlinear Ordinary Differential Equations - Definition

Models the concentration of RNA, proteins and other molecules by time-dependant variables

Gene regulation is modeled by rate equations, expressing the rate of production of a component as function of the concentrations of other components

Rate expressions have the following form:

where x = [x1 , ... , xn] ≥ 0 denotes the vector of concentrations and ƒi: Rn → R a usually non-linear function

Discrete time delays τi1, ... , τin > 0 can also be represented:


21/41

6.2 ODE - Definition

Goal: Specifying function ƒi

k1,n, k2,1, ... , kn,n-1 > 0 are production constants and gamma are degradation constants

The rate expression express a balance between the number of molecules appearing, disappearing per unit time

For x1, a regulation function r: R → R is involved whereas the concentration for i > 1 increases linearly in xi-1

An often used regulation function is the so-called Hill curve:


22/41

6.3 ODE - Definition

θj > 0 describes the threshold for the regulatory influence of xj to a target gene

m is stepness parameter The h+-function ranges from 0 to 1 An increase in xj (xj →∞) will tend to increase the expression

rate of a gene (activation), In order to express that an increase of xj will tend to

decrease the expression rate (inhibition), the regulation function is replaced by:


23/41

6.4 ODE Properties

Advantages More „realistic“ way of modeling

Disadvantages Lack of in vivo or in vitro measurements of kinetic

parameters in the rate equations Numerical parameter values are available for only a

handful of well-studied systems (λ-phage) In most cases parameter values had to be chosen such

that the models were able to reproduce observed qualitative behavior

For larger models finding appropriate values may be difficult

Solution Growing availability of data could handle the problem to

some extent


24/41

7.1 Piecewise-Linear Differential Equations (PLDE) - Definition

Special case of rate equation, two simplifications: Interactions by directly relating the expression levels of

genes in the network. Continuous sigmoid curves is approximated by discontinuous

step functions PLDEs have the following form:

Where xi denotes the cellular concentrations of the product of gene i and γ > 0 the degradation rate

The function gi: Rn≥0 → R≥0 is defined as:

where kil > 0 is a rate parameter, bil: R → {0,1} a combination

of step functions bil is arithmetic equivalent of a boolean function,

expressing conditions under which gene is expressed at a rate kil (step function)


25/41

7.2 PLDE - Graphical simplification

Consider an n-dimensional hyperbox defined by:

Assume that for all threshold concentrations θik of the protein encoded by gene i it holds that θik < maxi

The n-1 hyperplanes defined by the thresholds divide the box into orthants

Each orthant of the box reduces to ODEs with a constant production term μi composed of rate parameters in bi:


26/41

7.3 PLDE - Example

State equations corresponding to the orthant 0 ≤ x1 < θ21, θ12 < x2 ≤ max2 and θ33 < x3 ≤ max3


27/41

8.1 Qualitative Differential Equations (QDE) - Definition

Incomplete understanding GRNs and absence of quantitative knowledge → need for qualitative simulation techniques

Idea behind QDE: abstract discrete description from continuous model

Discrete abstraction then used to draw conclusions about the dynamics of the system

QDEs are abstractions of ODEs of the form:

where ƒi: R → R and x take a qualitative value composed of a

qualitative magnitude and direction


28/41

8.2 QDE - Properties

The qualitative magnitude of a variable xi is a discrete abstraction of its real value, the qualitative direction is the sign of its derivate

The function ƒi is abstracted into a set of qualitative constraints

Algorithm (QSIM) generates a tree of qualitative behaviors out of an initial qualitative state consisting of qualitative values

Each behavior in the tree describes a possible sequence of state transitions from the initial state

Every qualitatively distinct behavior of the ODE corresponds to a behavior in the tree generated from the QDE (the reverse may not be true)


29/41

8.3 QDE - Summary

Problems Limited up scalability, behavior trees quickly grow out of

bounds

Solutions Using a simulation algorithm tailored to the equations,

larger networks with complex feedback loops can be treated

Advantages allow weak numerical information Integration of numerical information is more difficult to

achieve in logical approaches


30/41

8.4 QDE – HYPGENE / GENSIM

Qualitative process theory is used for construction and revision of gene regulation models

User definition and knowledge base are used by GENSIM to simulate a proposed experiment

If the predictions do not match, HYPGENE-algorithm generates hypothesis to explain the discrepancies

HYPGENE revises assumptions about the experimental conditions

Helps to refine the model Both algorithms have been able to partially reproduce the

experimental reasoning of the attenuation mechanism regulating the synthesis of tryptophan in E.coli


31/41

9.1 Partial Differential Equations (PDE) – Motivation

Regulatory systems are assumed to be spatially homogenous

Important in certain situations to abstract from these assumptions

Distinguish between different compartments of a cell, for example nucleus and cytoplasm or multiple cells affecting each other

Diffusion of regulatory proteins or metabolites for one compartment to another

This is a critical feature in embryonal development


32/41

9.2 Partial Differential Equations (PDE) – Definition

The reaction-diffusion-equation (for a row of cells):

Can be adapted to other 1- or higher dimensional spacial configurations

If number of cells is large enough, discrete variable l can be replaced by continuous variable λ representing the size of the system

Concentration variables now are defined as functions of l and t and the reaction-diffusion-equations become a partial differential equation (PDE):

Using modes or eigenfunctions of the Laplacian operator gives:


33/41

9.3 Partial Differential Equations (PDE) – Definition

Product of gene 1, the activator, must positively regulate itself; product of gene 2, the inhibitor, must negatively regulate gene 1

Activator-inhibitor-systems were extensively used to study the emergence of segmentation patterns in the early Drosophila embryo

Observed spacial and temporal expression patterns of genes much resemble to the models modes

Numerical simulations demonstrated that some aspects of stripe formation in the Drosophila blastoderm can indeed be reproduced this way


34/41

9.4 Partial Differential Equations (PDE) – Properties

Shown formula still not applicable in all situations, more complex formulas were formed for several special cases

Predictions quite sensitive to the shape of the spacial domain, the boundary conditions and chosen parameter values

Models need to be simple and usually are strong abstractions of biological processes (i.e. only watch at concentrations of a few gene-products)

For larger and more complex models computational costs for finding an optimal fit between data and parameters may be prohibitively high


35/41

10.1 Stochastic Master Equations - Motivation

Differential equations describe gene regulation in great detail

Differential equations presuppose the concentrations of substances continuously and deterministically

Both assumptions are questionable in the case of gene regulation

So, we prefer to use a discrete and stochastic approach Discrete amounts X of molecules are taken as state

variables, joint probability distribution p(X, t) is introduced to express probability that at time t the cell contains X1 molecules of the first species, X2 of the second, etc.


36/41

10.2 Stochastic Master Equations - Definition

The time evolution of p(X, t) can be expressed as:

Where m is number of reactions, αjΔt the probability that reaction j will occur in the

interval [t, t+Δt] given that system is in state X at time t βj Δt the probability that reaction j will bring the system

in state X from another state in [t, t+ Δt]

Rearranging and taking limit Δt → 0 gives the Master equation:


37/41

10.3 Stochastic Master Equations - Properties

Master equations can be approximated by stochastic differential equations

An alternative approach would be to disregard the master equations and directly simulate the time evolution

Based on the stochastic simulation approach Determines when the next reaction occurs and of which

type it will be Revises the state in accordance with this reaction Continuous at the resulting next state

Master equations deal with the behavior averages, stochastic simulation provides information on individual behaviors


38/41

10.4 Stochastic Master Equations - Summary

Advantages Simulation results in closer approximations to the

molecular reality of gene regulation

Disadvantages The use of stochastic simulation is not always evident Requires detailed knowledge Simulation is costly


39/41

11.1 Rule-Based Formalisms (RBF) - Definiton

Knowledge-based or rule-based simulation formalisms, permit rich knowledge about system to express in a single formalism

Consist of two components: facts and rules The rules consist of two parts: condition and action

Advantages Capability to deal with a richer variety of biological knowledge

Disadvantages Difficulties in maintaining a consistent knowledge base RBF cannot compete with former formalisms (quantitatively)


40/41

12.1 Conclusions

Major difficulties in modeling and simulating geneticregulatory networks: Biochemical reaction mechanisms are not known or a

incompletely known Quantitative information and molecular concentration is

only selfdom availableFormalisms discussed allow GRSes to be modeled in quitedifferent ways – depending on application:


41/41

12.2 Expectations

Emergence of new experimental techniques promise to relieve the data bottleneck

Increasing knowledge on molecular mechanisms to model regulatory systems allow a finer level of granularity

The use of quantitative models permits larger systems to be studied at a higher precision

The expectations will bring researchers nearer to the ultimate goal: to use models that integrate gene regulation with metabolism, signal transduction, replication and repair and a variety of other celluar processes

Each of the approaches above has its merits, but neither of them seems sufficient in itself

It can be expected that a combination of the two approaches, exploiting a wide range of structural and functional information on regulatory networks, will be most effective


42/41

13.1 References / Acknowledgements

References (and all images taken from) Hidde de Jong,

Modeling and simulation of genetic regulatory systems: a literature review; J Comput Biol. 2002;9(1):67-103. Review.

Acknowlegements Thanks to Marite Sirava and Thomas Schäfer at ZBI of

Universität des Saarlandes for supporting us to work out this talk