33
Using Gröbner Bases to Reconstruct Regulatory Modules in C. elegans Brandilyn Stigler Southern Methodist University

Using Gröbner Bases to Reconstruct Regulatory Modules in C. elegans Brandilyn Stigler Southern Methodist University

Embed Size (px)

Citation preview

Using Gröbner Bases to Reconstruct Regulatory Modules in C. elegans

Brandilyn StiglerSouthern Methodist University

SAMSI

September 16, 2008

Brandy’s Bio

Education and Training PhD in mathematics – 2005, Virginia Tech

Advisor: Reinhard Laubenbacher Postdoctoral Fellow – 2008, Math. Biosci. Inst.

Math mentor: Winfried Just – Math, Ohio U. Bio mentor: Helen Chamberlin – Molecular genetics, OSU

Research Interests Systems biology

Reverse engineering of gene regulatory networks Computational algebra

Gröbner bases of zero-dimensional radical ideals

SAMSI

September 16, 2008

Systems Biology

(Kitano) The study of an organism, viewed as an integrated and interacting network of biochemicals, through understanding

structure and dynamics methods of control and design

(Ideker) The study of biological systems by perturbing them and

monitoring the responses integrating the data and

formulating mathematical models that describe system structure and the response.

SAMSI

September 16, 2008

Gene regulatory networks are the main objects of study in molecular SB

SAMSI

September 16, 2008

MolecularSystems Biology

“forward engineering”

“reverse engineering”

SAMSI

September 16, 2008

“forward engineering”

“reverse engineering”

SAMSI

September 16, 2008

Overview

Models and methods in RE

Polynomial dynamical systems

An algorithm for reverse engineering using computational algebra

Application to tissue development in C. elegans

SAMSI

September 16, 2008

Mathematical Methods for Modeling

Continuous systemsLinear algebra

Statistics, Bayesian inference

Boolean algebra

Logic Stochastic processes

Trends Biotech 2003

Building with a scaffold: emerging strategies for

high- to low-level cellular modeling

Ideker et al.

SAMSI

September 16, 2008

Reverse engineering: continuous systems

Yeung et al. (2002) built a model of linear ODEs for a gene regulatory network near a steady state.

X = mRNA concentrations (given)W = type and strength of interaction (unknown)B = external stimuli (given)

Robust regression to select sparsest matrix;W0 particular solution, C vanishes on X

Singular value decomposition; U, V orthogonal

SAMSI

September 16, 2008

Challenges of RE Methods

Many models may fit the same data. Analysis of solution (model) space may be difficult. Model selection is crucial.

Continuous models: parameters may not be known Needed: methods to “learn” parameters Solution: genetic algorithms, for example

Boolean models: algorithms based on enumeration Needed: algorithms to compute “space” of models Solution: use of algebraic techniques

SAMSI

September 16, 2008

Mathematical Methods for Modeling

Computational algebra

Trends Biotech 2003

Building with a scaffold: emerging strategies for

high- to low-level cellular modeling

Ideker et al.Polynomial

dynamical systems

SAMSI

September 16, 2008

Polynomial Dynamical Systems

g1 g2 gn

f1(x1,…,xn), f2(x1,…,xn), fn(x1,…,xn) )

x1 x2 xn…

…,

Variables with states in a finite set S

Transition functions fi

Finite dyn. sys. f

Genes (proteins, etc.)

If |S| = prime, then S ≈ field.

Theorem: Function fi : Sn → S = polynomial in S[x1,…,xn]

Polynomial dynamical system (PDS) := finite dyn. sys f : Sn → Sn over a finite field

f = (

SAMSI

September 16, 2008

PDSs store structure and dynamics f = ( f1, f2, f3 ) : (Z3)3 → (Z3)3

f1 = – x32 + x1

f2 = x32 – x3 + 1

f3 = – x32 + x1 + 1

x1

x3

x2

Wiring diagram (WD)

State space

Fixed point

Limit cycle

SAMSI

September 16, 2008

Computing PDSs from Data

Input: T = {s1,…,st} time series in kn (k = a finite field)

Output: F = a minimal PDS

• Find one particular solution f 0 = (f1,…,fn) with f 0(si) = si+1.

• Construct ideal of vanishing functions I = < g(x): g(si) = 0 >.

All PDSs that fit T: f 0 + I := { (f1,…,fn) + (g1,…,gn) }.

• Select minimal PDS F = (F1,…,Fn) with Fi = fi % I.

Implemented in Macaulay 2

Available at http://polymath.vbi.vt.edu/rev-eng/reveng.php

SAMSI

September 16, 2008

Gröbner Bases

is a Gröbner basis for Iif the leading term of f is divisible by the leading term of some gi under >.

The normal form of f with respect to G NF(f, G) = the remainder of f on division by G.

1{ ,..., }mG g g I Gröbner

bases exist (not

unique).

NF(f, G) is

unique.

Let > be a term order, I an ideal in k[x1,…,xn], and f a polynomial.

SAMSI

September 16, 2008

RE Methods using PDSs

R Laubenbacher, BS. 2004. E Allen, J Fetrow, L Daniel, S Thomas,

D John. 2006. E Dimitrova, A Jarrah, R

Laubenbacher, BS. 2007.D Heldt, M Kreuzer, S Pokutta, H

Poulisse. 2006.P Vera-Licona. 2007.A Jarrah, R Laubenbacher, M

Stillman, BS. 2007.BS, A Jarrah, R Laubenbacher, P

Mendes. 2007

Gröbner bases (GB) GB and Deegan–Packel

Index of Power (DPIP) Gröbner fan (GF) and

DPIP Approximate GB

Evolutionary algorithm Minimal sets (MS)

MS and GF

SAMSI

September 16, 2008

Stigler, Jarrah, Laubenbacher, Mendes. Reverse engineering of dynamic networks. NY Acad Sci 2007Jarrah, Laubenbacher, Stillman, Stigler. Reverse engineering of polynomial dynamical systems. Adv Appl Math 2006

Reverse Engineering using PDSs

Minimal WD

Primary decomposition

Minimal WD

… …

Ideal <-> variety

Model space

Ideal <-> variety

Model space

Minimal PDSGröbne

r fan

Minimal PDS

Minimal PDS

Gröbner fan

Minimal PDS

Experim. data

4

6

8

10

t1 t2 t3 t4 t5

G1 G2 G3

Mutual information Discret

e data

x3

02000

Time

t1

t2

t3

t4

t5

x2

01211

x1

12000

{ f + I | f (ti) = ti+1, I = < g(x) : g(ti)=0 >}

SAMSI

September 16, 2008

x2

01211

x3

02000

Time

t1

t2

t3

t4

t5

x1

12000

Ideal <-> variety

Experim. data

Mutual information Discret

e data

Model space

Ideal <-> varietyMinimal

WD

Primary decomposition

Minimal WD

… …

Model space

Minimal PDS

Gröbner fan

Minimal PDS

Minimal PDS

Gröbner fan

Minimal PDS

Primary decomposition produces minimal sets of variables required to define a PDS, thereby computing of the intersection of all wiring

diagrams.

(1 0 0) → 1

(2 1 2) → 2

(0 2 0) → 1

(0 1 0) → 1

(1 0 0) → 1 (2 1 2) → 2

(0 2 0) → 1

(0 1 0) → 1

Adv Appl Math 2007

Reverse engineering of

polynomial dynamical systems

Jarrah et al.

Encode: <x1x2x3 , x1x3> = <x1x3> = <x1> ∩

<x3>Interpret: x1->x2 (or x3->x2) in all WDs

Computing Minimal WDs

SAMSI

September 16, 2008

Ideal <-> variety

Experim. data

Mutual information Discret

e data

Model space

Ideal <-> varietyMinimal

WD

Primary decomposition

Minimal WD

… …

Model space

Minimal PDS

Gröbner fan

Minimal PDS

Minimal PDS

Gröbner fan

Minimal PDS

Term orders in a “cone” give

the same model.

Gröbner fan partitions the term order “space” and allows for efficient exploration of model space to find most representative

model.

Exploring the Model Space

{ f + I | f (ti) = ti+1, I = < g(x) : g(ti)=0 >}

SAMSI

September 16, 2008

Method Validation:Segment polarity network in the fruitfly

Network in cell: 15 genes, proteinsBoolean model (Albert, Othmer 2003) 44 known interactions 6 extracellular interactions

Time series data Generated wildtype, knockout < 0.01% of 221 total states

4 most likely PDSs 82% links (36 TP, 2 FP, 8 TN) 100% terms for 6 fncs; 88% TP, 39.5% TN for 9 fncs Missing terms = unobserved interactions 100% fixed points

NY Acad of Sci 2007

Reverse-Engineering of

Dynamic Networks Stigler et al.

SAMSI

September 16, 2008

Identification of Muscle Module in Caenorhabditis elegans

Genes and Development 2006

Defining the transcriptional redundancy of early bodywall muscle development in C. elegans: evidence for a unified theory of animal muscle development

Fukushige et al.

SAMSI

September 16, 2008

Regulatory Modules in C. elegans

Baugh et al. (Development 2005) identified tissue-identity genes (TIGs) := targets of PAL-1.

Our goals: Model TIG network using their published microarray

time series data. Reconstruct muscle module. Identify ectoderm module.

Joint work with H. Chamberlin - OSU

R. Hill - OSUR. Laubenbacher - VBI

SAMSI

September 16, 2008

Regulatory Module for TIG Network Time series contains 10 points. Data discretized to 5 states. Predicted modules for muscle, ectoderm. Most edges in muscle module supported in literature. New prediction for timing of regulatory

interactions.

pal-1 C55C2.1

unc-120

hlh-1

hnd-1

SAMSI

September 16, 2008

PDS for TIG Network

Does polynomial “form” encode “phenotype”?

SAMSI

September 16, 2008

Conclusions

Algorithm reverse engineers networks by Identifying minimal WDs Computing all minimal PDSs on the WD.

Advantages of PDSs Provide compact representation of model space and

framework within which to analyze the model space Facilitate hypothesis generation for further network

exploration and discovery.

Applications to gene regulatory networks High identification in fruit fly network Reconstructed C. elegans muscle, proposed ectoderm

module Generated new hypotheses for regulation timing Potential for predictions about mechanisms

SAMSI

September 16, 2008

Collaborations

C. elegans

H. Chamberlin – OSU, mol. gen.

R. Hill – OSU,molecular genetics

R. Laubenbacher – VBI, comp. algebra

Yeast Group @ VBI

Simulated networks

D. Camacho – Boston U, biochemistry

E. Dimitrova – Clemson, comp. algebra

A. Jarrah – VBI, comp. algebra)

R. Laubenbacher – VBI, comp. algebra

P. Mendes – Manchester, biochemistry

P. Vera Licona – DIMACS, comp. algebra

Development of theoryW. Just – Ohio U, logic/math bio

A. Taylor – Colorado College, comm. algebra

Development of algorithms

W. Just – Ohio U, logic/math bio

R. Laubenbacher – VBI, comp. algebra

M. Stillman – Cornell, comp. algebra

SAMSI

September 16, 2008

SAMSI

September 16, 2008

Computing PDSs from 2 2 1 -> 0 1 2 -> 1 0 1 -> 0 1 0

0 2 21 1 3 1 1 3

0 2 22 1 3 1 1 3 1

0 2 23 1 3 1 1 3 1

1

1

f x x x x x

f x x x x x x

f x x x x x x

I = < x1+x2-1, x2x3-x32+x2-x1, x2

2-x32+x2-x3 >

= ∩ < x1 – si1,…, xn – sin >

Step 1

Step 2

Step 3 f1 = f10 mod GB(I) = – x3

2 + x3

f2 = f20 mod GB(I) = x3

2 – x3 + 1

f3 = f30 mod GB(I) = – x3

2 + x2 + 1

Requires a term order:grevlex with x1 > x2 > x3

f 0 = (f1

0, f2

0, f3

0)

SAMSI

September 16, 2008

21 15

18

12

19 17

16

14

1323

22

20

-10

-11-12

-5-6

-7

-8

-9

-2

-3

-4

-1

01

2

3

4

56

7

8

9

10

11

Arithmetic in a Finite Number System

Zp = integers modulo p = {0, 1, …, p-1}

Z12 = {0, 1, …, 11} “clock” arithmetic

p prime => fieldp not prime => ring

SAMSI

September 16, 2008

Gene regulatory networks are the main objects of study in molecular SBInterconnected biochemicals,

including DNA-derived (mRNAs and proteins) and non-DNA-derived (metabolites)

DNA = recipe book

Gene = recipe

mRNA = copy of recipe

Protein = outcome of recipe

Metabolites = other “helpers”

SAMSI

September 16, 2008

Apply to oxidative stress response network in the yeast S. cerevisiae

A new mathematical modeling approach to biochemical networks, with an application to oxidative stress in yeast

Develop mathematical tools to model biochemical networks given experimental data

} combine Continuous models

(ODEs) Discrete models (PDSs)

Glutathione metabolism

Yeast Group @

SAMSI

September 16, 2008

Transcriptomic

7 mutants + 1 wildtype (knockouts)

3 replicates

2 treatments (with and without

CHP)

8 time points

Proteomic

7 mutants + 1 wildtype (knockouts)

3 replicates

2 treatments (with and without

CHP)

8 time points

Metabolomic

7 mutants + 1 wildtype (knockouts)

3 replicates

2 treatments (with and without

CHP)

8 time points

= 1152 total data points!

SAMSI

September 16, 2008

Theoretical Improvements

Computing Gröbner bases (with W. Just – Ohio U) Implemented algorithm using LU

decomposition in Macaulay 2 Identifies essential variables (= support std mon) Reduces computation to ring in EV Complexity = O(nm2+m4)

Computing GB structure (with A. Taylor – Colo C) Extended Shape Lemma for graded orders Connecting to term detection in statistics

with solution being noiseless linear regression

2008 In Preparation

Reverse Engineering

Gröbner Bases Stigler and Taylor

2007 Submitted

Efficiently Computing

Gröbner Bases of Ideals of

Points Just and Stigler