Upload
jacob-francis
View
215
Download
2
Tags:
Embed Size (px)
Citation preview
Using Gröbner Bases to Reconstruct Regulatory Modules in C. elegans
Brandilyn StiglerSouthern Methodist University
SAMSI
September 16, 2008
Brandy’s Bio
Education and Training PhD in mathematics – 2005, Virginia Tech
Advisor: Reinhard Laubenbacher Postdoctoral Fellow – 2008, Math. Biosci. Inst.
Math mentor: Winfried Just – Math, Ohio U. Bio mentor: Helen Chamberlin – Molecular genetics, OSU
Research Interests Systems biology
Reverse engineering of gene regulatory networks Computational algebra
Gröbner bases of zero-dimensional radical ideals
SAMSI
September 16, 2008
Systems Biology
(Kitano) The study of an organism, viewed as an integrated and interacting network of biochemicals, through understanding
structure and dynamics methods of control and design
(Ideker) The study of biological systems by perturbing them and
monitoring the responses integrating the data and
formulating mathematical models that describe system structure and the response.
SAMSI
September 16, 2008
Overview
Models and methods in RE
Polynomial dynamical systems
An algorithm for reverse engineering using computational algebra
Application to tissue development in C. elegans
SAMSI
September 16, 2008
Mathematical Methods for Modeling
Continuous systemsLinear algebra
Statistics, Bayesian inference
Boolean algebra
Logic Stochastic processes
Trends Biotech 2003
Building with a scaffold: emerging strategies for
high- to low-level cellular modeling
Ideker et al.
SAMSI
September 16, 2008
Reverse engineering: continuous systems
Yeung et al. (2002) built a model of linear ODEs for a gene regulatory network near a steady state.
X = mRNA concentrations (given)W = type and strength of interaction (unknown)B = external stimuli (given)
Robust regression to select sparsest matrix;W0 particular solution, C vanishes on X
Singular value decomposition; U, V orthogonal
SAMSI
September 16, 2008
Challenges of RE Methods
Many models may fit the same data. Analysis of solution (model) space may be difficult. Model selection is crucial.
Continuous models: parameters may not be known Needed: methods to “learn” parameters Solution: genetic algorithms, for example
Boolean models: algorithms based on enumeration Needed: algorithms to compute “space” of models Solution: use of algebraic techniques
SAMSI
September 16, 2008
Mathematical Methods for Modeling
Computational algebra
Trends Biotech 2003
Building with a scaffold: emerging strategies for
high- to low-level cellular modeling
Ideker et al.Polynomial
dynamical systems
SAMSI
September 16, 2008
Polynomial Dynamical Systems
g1 g2 gn
f1(x1,…,xn), f2(x1,…,xn), fn(x1,…,xn) )
x1 x2 xn…
…,
Variables with states in a finite set S
Transition functions fi
Finite dyn. sys. f
Genes (proteins, etc.)
…
If |S| = prime, then S ≈ field.
Theorem: Function fi : Sn → S = polynomial in S[x1,…,xn]
Polynomial dynamical system (PDS) := finite dyn. sys f : Sn → Sn over a finite field
f = (
SAMSI
September 16, 2008
PDSs store structure and dynamics f = ( f1, f2, f3 ) : (Z3)3 → (Z3)3
f1 = – x32 + x1
f2 = x32 – x3 + 1
f3 = – x32 + x1 + 1
x1
x3
x2
Wiring diagram (WD)
State space
Fixed point
Limit cycle
SAMSI
September 16, 2008
Computing PDSs from Data
Input: T = {s1,…,st} time series in kn (k = a finite field)
Output: F = a minimal PDS
• Find one particular solution f 0 = (f1,…,fn) with f 0(si) = si+1.
• Construct ideal of vanishing functions I = < g(x): g(si) = 0 >.
All PDSs that fit T: f 0 + I := { (f1,…,fn) + (g1,…,gn) }.
• Select minimal PDS F = (F1,…,Fn) with Fi = fi % I.
Implemented in Macaulay 2
Available at http://polymath.vbi.vt.edu/rev-eng/reveng.php
SAMSI
September 16, 2008
Gröbner Bases
is a Gröbner basis for Iif the leading term of f is divisible by the leading term of some gi under >.
The normal form of f with respect to G NF(f, G) = the remainder of f on division by G.
1{ ,..., }mG g g I Gröbner
bases exist (not
unique).
NF(f, G) is
unique.
Let > be a term order, I an ideal in k[x1,…,xn], and f a polynomial.
SAMSI
September 16, 2008
RE Methods using PDSs
R Laubenbacher, BS. 2004. E Allen, J Fetrow, L Daniel, S Thomas,
D John. 2006. E Dimitrova, A Jarrah, R
Laubenbacher, BS. 2007.D Heldt, M Kreuzer, S Pokutta, H
Poulisse. 2006.P Vera-Licona. 2007.A Jarrah, R Laubenbacher, M
Stillman, BS. 2007.BS, A Jarrah, R Laubenbacher, P
Mendes. 2007
Gröbner bases (GB) GB and Deegan–Packel
Index of Power (DPIP) Gröbner fan (GF) and
DPIP Approximate GB
Evolutionary algorithm Minimal sets (MS)
MS and GF
SAMSI
September 16, 2008
Stigler, Jarrah, Laubenbacher, Mendes. Reverse engineering of dynamic networks. NY Acad Sci 2007Jarrah, Laubenbacher, Stillman, Stigler. Reverse engineering of polynomial dynamical systems. Adv Appl Math 2006
Reverse Engineering using PDSs
Minimal WD
Primary decomposition
Minimal WD
… …
Ideal <-> variety
Model space
Ideal <-> variety
Model space
Minimal PDSGröbne
r fan
Minimal PDS
…
Minimal PDS
Gröbner fan
Minimal PDS
…
Experim. data
4
6
8
10
t1 t2 t3 t4 t5
G1 G2 G3
Mutual information Discret
e data
x3
02000
Time
t1
t2
t3
t4
t5
x2
01211
x1
12000
{ f + I | f (ti) = ti+1, I = < g(x) : g(ti)=0 >}
SAMSI
September 16, 2008
x2
01211
x3
02000
Time
t1
t2
t3
t4
t5
x1
12000
Ideal <-> variety
Experim. data
Mutual information Discret
e data
Model space
Ideal <-> varietyMinimal
WD
Primary decomposition
Minimal WD
… …
Model space
Minimal PDS
Gröbner fan
Minimal PDS
…
Minimal PDS
Gröbner fan
Minimal PDS
…
Primary decomposition produces minimal sets of variables required to define a PDS, thereby computing of the intersection of all wiring
diagrams.
(1 0 0) → 1
(2 1 2) → 2
(0 2 0) → 1
(0 1 0) → 1
(1 0 0) → 1 (2 1 2) → 2
(0 2 0) → 1
(0 1 0) → 1
Adv Appl Math 2007
Reverse engineering of
polynomial dynamical systems
Jarrah et al.
Encode: <x1x2x3 , x1x3> = <x1x3> = <x1> ∩
<x3>Interpret: x1->x2 (or x3->x2) in all WDs
Computing Minimal WDs
SAMSI
September 16, 2008
Ideal <-> variety
Experim. data
Mutual information Discret
e data
Model space
Ideal <-> varietyMinimal
WD
Primary decomposition
Minimal WD
… …
Model space
Minimal PDS
Gröbner fan
Minimal PDS
…
Minimal PDS
Gröbner fan
Minimal PDS
…
Term orders in a “cone” give
the same model.
Gröbner fan partitions the term order “space” and allows for efficient exploration of model space to find most representative
model.
Exploring the Model Space
{ f + I | f (ti) = ti+1, I = < g(x) : g(ti)=0 >}
SAMSI
September 16, 2008
Method Validation:Segment polarity network in the fruitfly
Network in cell: 15 genes, proteinsBoolean model (Albert, Othmer 2003) 44 known interactions 6 extracellular interactions
Time series data Generated wildtype, knockout < 0.01% of 221 total states
4 most likely PDSs 82% links (36 TP, 2 FP, 8 TN) 100% terms for 6 fncs; 88% TP, 39.5% TN for 9 fncs Missing terms = unobserved interactions 100% fixed points
NY Acad of Sci 2007
Reverse-Engineering of
Dynamic Networks Stigler et al.
SAMSI
September 16, 2008
Identification of Muscle Module in Caenorhabditis elegans
Genes and Development 2006
Defining the transcriptional redundancy of early bodywall muscle development in C. elegans: evidence for a unified theory of animal muscle development
Fukushige et al.
SAMSI
September 16, 2008
Regulatory Modules in C. elegans
Baugh et al. (Development 2005) identified tissue-identity genes (TIGs) := targets of PAL-1.
Our goals: Model TIG network using their published microarray
time series data. Reconstruct muscle module. Identify ectoderm module.
Joint work with H. Chamberlin - OSU
R. Hill - OSUR. Laubenbacher - VBI
SAMSI
September 16, 2008
Regulatory Module for TIG Network Time series contains 10 points. Data discretized to 5 states. Predicted modules for muscle, ectoderm. Most edges in muscle module supported in literature. New prediction for timing of regulatory
interactions.
pal-1 C55C2.1
unc-120
hlh-1
hnd-1
SAMSI
September 16, 2008
Conclusions
Algorithm reverse engineers networks by Identifying minimal WDs Computing all minimal PDSs on the WD.
Advantages of PDSs Provide compact representation of model space and
framework within which to analyze the model space Facilitate hypothesis generation for further network
exploration and discovery.
Applications to gene regulatory networks High identification in fruit fly network Reconstructed C. elegans muscle, proposed ectoderm
module Generated new hypotheses for regulation timing Potential for predictions about mechanisms
SAMSI
September 16, 2008
Collaborations
C. elegans
H. Chamberlin – OSU, mol. gen.
R. Hill – OSU,molecular genetics
R. Laubenbacher – VBI, comp. algebra
Yeast Group @ VBI
Simulated networks
D. Camacho – Boston U, biochemistry
E. Dimitrova – Clemson, comp. algebra
A. Jarrah – VBI, comp. algebra)
R. Laubenbacher – VBI, comp. algebra
P. Mendes – Manchester, biochemistry
P. Vera Licona – DIMACS, comp. algebra
Development of theoryW. Just – Ohio U, logic/math bio
A. Taylor – Colorado College, comm. algebra
Development of algorithms
W. Just – Ohio U, logic/math bio
R. Laubenbacher – VBI, comp. algebra
M. Stillman – Cornell, comp. algebra
SAMSI
September 16, 2008
Computing PDSs from 2 2 1 -> 0 1 2 -> 1 0 1 -> 0 1 0
0 2 21 1 3 1 1 3
0 2 22 1 3 1 1 3 1
0 2 23 1 3 1 1 3 1
1
1
f x x x x x
f x x x x x x
f x x x x x x
I = < x1+x2-1, x2x3-x32+x2-x1, x2
2-x32+x2-x3 >
= ∩ < x1 – si1,…, xn – sin >
Step 1
Step 2
Step 3 f1 = f10 mod GB(I) = – x3
2 + x3
f2 = f20 mod GB(I) = x3
2 – x3 + 1
f3 = f30 mod GB(I) = – x3
2 + x2 + 1
Requires a term order:grevlex with x1 > x2 > x3
f 0 = (f1
0, f2
0, f3
0)
SAMSI
September 16, 2008
21 15
18
12
19 17
16
14
1323
22
20
-10
-11-12
-5-6
-7
-8
-9
-2
-3
-4
-1
01
2
3
4
56
7
8
9
10
11
Arithmetic in a Finite Number System
Zp = integers modulo p = {0, 1, …, p-1}
Z12 = {0, 1, …, 11} “clock” arithmetic
p prime => fieldp not prime => ring
SAMSI
September 16, 2008
Gene regulatory networks are the main objects of study in molecular SBInterconnected biochemicals,
including DNA-derived (mRNAs and proteins) and non-DNA-derived (metabolites)
DNA = recipe book
Gene = recipe
mRNA = copy of recipe
Protein = outcome of recipe
Metabolites = other “helpers”
SAMSI
September 16, 2008
Apply to oxidative stress response network in the yeast S. cerevisiae
A new mathematical modeling approach to biochemical networks, with an application to oxidative stress in yeast
Develop mathematical tools to model biochemical networks given experimental data
} combine Continuous models
(ODEs) Discrete models (PDSs)
Glutathione metabolism
Yeast Group @
SAMSI
September 16, 2008
Transcriptomic
7 mutants + 1 wildtype (knockouts)
3 replicates
2 treatments (with and without
CHP)
8 time points
Proteomic
7 mutants + 1 wildtype (knockouts)
3 replicates
2 treatments (with and without
CHP)
8 time points
Metabolomic
7 mutants + 1 wildtype (knockouts)
3 replicates
2 treatments (with and without
CHP)
8 time points
= 1152 total data points!
SAMSI
September 16, 2008
Theoretical Improvements
Computing Gröbner bases (with W. Just – Ohio U) Implemented algorithm using LU
decomposition in Macaulay 2 Identifies essential variables (= support std mon) Reduces computation to ring in EV Complexity = O(nm2+m4)
Computing GB structure (with A. Taylor – Colo C) Extended Shape Lemma for graded orders Connecting to term detection in statistics
with solution being noiseless linear regression
2008 In Preparation
Reverse Engineering
Gröbner Bases Stigler and Taylor
2007 Submitted
Efficiently Computing
Gröbner Bases of Ideals of
Points Just and Stigler