71
Insights from Boolean Insights from Boolean Modeling of Genetic Modeling of Genetic Regulatory Networks Regulatory Networks ilya shmulevich ilya shmulevich

Insights from Boolean Modeling of Genetic Regulatory Networks ilya shmulevich

Embed Size (px)

Citation preview

Page 1: Insights from Boolean Modeling of Genetic Regulatory Networks ilya shmulevich

Insights from Boolean Insights from Boolean Modeling of Genetic Modeling of Genetic Regulatory NetworksRegulatory Networks

ilya shmulevichilya shmulevich

Page 2: Insights from Boolean Modeling of Genetic Regulatory Networks ilya shmulevich

22

Part IPart I

1.1. Discover and understand the Discover and understand the underlying gene regulatory underlying gene regulatory mechanisms by means of inferring mechanisms by means of inferring them from data.them from data.

2.2. By using the inferred model, By using the inferred model, endeavor to make useful endeavor to make useful predictions by mathematical predictions by mathematical analysis and computer simulations.analysis and computer simulations.

Page 3: Insights from Boolean Modeling of Genetic Regulatory Networks ilya shmulevich

33

genetic networksgenetic networks

Complex regulatory Complex regulatory networks among genes networks among genes and their products and their products control cell behaviors control cell behaviors such as:such as:– cell cyclecell cycle– apoptosisapoptosis– cell differentiationcell differentiation– communication between communication between

cells in tissuescells in tissues A paramount problem is A paramount problem is

to to understand the understand the dynamical interactionsdynamical interactions among these genes, among these genes, transcription factors, transcription factors, and signaling cascades, and signaling cascades, which govern the which govern the integrated behavior of integrated behavior of the cell.the cell.

Analogy: circuit diagram

Page 4: Insights from Boolean Modeling of Genetic Regulatory Networks ilya shmulevich

44

Clinical ImpactClinical Impact

Model-based and computational Model-based and computational analysis cananalysis can– open up a window on the physiology open up a window on the physiology

of an organism and disease of an organism and disease progression;progression;

– translate into accurate diagnosis, translate into accurate diagnosis, target identification, drug target identification, drug development, and treatment.development, and treatment.

Page 5: Insights from Boolean Modeling of Genetic Regulatory Networks ilya shmulevich

55

What class of models What class of models should be chosen?should be chosen? The selection should be made in The selection should be made in

view ofview of– data requirementsdata requirements– goals of modeling and analysis.goals of modeling and analysis.

Data Model Goals

Page 6: Insights from Boolean Modeling of Genetic Regulatory Networks ilya shmulevich

66

Classical tradeoffClassical tradeoff

A “fine” model with many parametersA “fine” model with many parameters– may be able to capture detailed “low-level” may be able to capture detailed “low-level”

phenomena (protein concentrations, phenomena (protein concentrations, reaction kinetics);reaction kinetics);

– requires large amounts of data for inferencerequires large amounts of data for inference A “coarse” model with low complexityA “coarse” model with low complexity

– may succeed in capturing only “high-level” may succeed in capturing only “high-level” phenomena (e.g. which genes are ON/OFF);phenomena (e.g. which genes are ON/OFF);

– requires smaller amounts of datarequires smaller amounts of data

Page 7: Insights from Boolean Modeling of Genetic Regulatory Networks ilya shmulevich

77

Ockham’s RazorOckham’s Razor

Underlies all scientific Underlies all scientific theory building.theory building.

Model complexity should Model complexity should never be made higher never be made higher than what is necessary to than what is necessary to faithfully “explain the faithfully “explain the data.”data.”

What What kindkind of data do we of data do we have and have and how muchhow much??

William of Ockham (1280-1349)

Page 8: Insights from Boolean Modeling of Genetic Regulatory Networks ilya shmulevich

88

Boolean NetworksBoolean Networks

1.1. To what extent do To what extent do such models such models represent reality?represent reality?

2.2. Do we have the Do we have the “right” type of data “right” type of data to infer these to infer these models?models?

3.3. What do we hope What do we hope to learn from them?to learn from them?

Page 9: Insights from Boolean Modeling of Genetic Regulatory Networks ilya shmulevich

99

Basic Structure of Basic Structure of Boolean NetworksBoolean Networks

A

X

B

Boolean functionA B X0 0 10 1 11 0 01 1 1

1 means active/expressed0 means inactive/unexpressed

In this example, two genes (A and B) regulate gene X. In principle, any number of “input” genes are possible. Positive/negative feedback is also common (and necessary for homeostasis).

Page 10: Insights from Boolean Modeling of Genetic Regulatory Networks ilya shmulevich

1010

Dynamics of Boolean Dynamics of Boolean NetworksNetworks

0 1 1 0 01

A B C D E F Time

1

A

1

B

0

C

1

D

1

E

0

F

Page 11: Insights from Boolean Modeling of Genetic Regulatory Networks ilya shmulevich

1111

State Space of Boolean State Space of Boolean NetworksNetworks

Picture generated using the program DDLab.

equate cellular states (or equate cellular states (or fates) with attractors.fates) with attractors.

attractor states are attractor states are stable under small stable under small perturbationsperturbations– most perturbations cause most perturbations cause

the network to flow back to the network to flow back to the attractor.the attractor.

– some genes are more some genes are more importantimportant and changing and changing their activation can cause their activation can cause the system to transition to the system to transition to a different attractor.a different attractor.

Page 12: Insights from Boolean Modeling of Genetic Regulatory Networks ilya shmulevich

1212

Taylor, Galitski

Non-Filamentous

Filamentous

Environmental Input

Mpt5

Cdc42

Dig1/2Kss1

Ras2

Ste11Ste20

Ste7

Tec1-Ste12

Boolean model of the yeast Boolean model of the yeast filamentation networkfilamentation network

Page 13: Insights from Boolean Modeling of Genetic Regulatory Networks ilya shmulevich

1313

But can we extract But can we extract meaningful biological meaningful biological information from gene information from gene expression data expression data entirely in the binary entirely in the binary domain?domain?

We reasoned that if genes, when quantized We reasoned that if genes, when quantized to only two levels (1 or 0) would not be to only two levels (1 or 0) would not be informative in separating known subclasses informative in separating known subclasses of tumors, then there would be little hope of tumors, then there would be little hope for Boolean inference of real genetic for Boolean inference of real genetic networks.networks.

Page 14: Insights from Boolean Modeling of Genetic Regulatory Networks ilya shmulevich

1414

Gene expression analysis Gene expression analysis in the binary domainin the binary domain By using binary gene By using binary gene

expression data and expression data and Hamming distance Hamming distance as a similarity as a similarity metric, a separation metric, a separation between different between different subtypes of gliomas subtypes of gliomas is evident, using is evident, using multidimensional multidimensional scaling.scaling.

Shmulevich, I. and Zhang, W. (2002) Bioinformatics 18(4), 555-565.

Page 15: Insights from Boolean Modeling of Genetic Regulatory Networks ilya shmulevich

1515

Boolean FrameworkBoolean Framework

Limited amounts of data and the Limited amounts of data and the noisy nature of the noisy nature of the measurements can make useful measurements can make useful quantitative inferences quantitative inferences problematic and a coarse-scale problematic and a coarse-scale qualitative modeling approach qualitative modeling approach seems to be justified.seems to be justified.

Boolean idealization enormously Boolean idealization enormously simplifies the modeling task.simplifies the modeling task.

We wish to study the collective We wish to study the collective regulatory behavior without regulatory behavior without specific quantitative details.specific quantitative details.

Boolean networks qualitatively Boolean networks qualitatively capture typical genetic behavior.capture typical genetic behavior.

• Albert, R & Othmer, Albert, R & Othmer, H.G. (2003) H.G. (2003) J. Theor. J. Theor. BiolBiol. . 223223, 1-18., 1-18.• Mendoza, L., Mendoza, L., Thieffry, D. & Thieffry, D. & Alvarez-Buylla, R.E. Alvarez-Buylla, R.E. (1999) (1999) Bioinformatics Bioinformatics 1515, 593-606., 593-606.• Huang, S. & Ingber, Huang, S. & Ingber, D. E. (2000) D. E. (2000) Exp. Exp. Cell Res.Cell Res. 261,261, 91-103. 91-103.• Li F, Long T, Lu Y, Li F, Long T, Lu Y, Ouyang Q, Tang C. Ouyang Q, Tang C. (2004) (2004) PNASPNAS. . 101(14):4781-6.101(14):4781-6.

Page 16: Insights from Boolean Modeling of Genetic Regulatory Networks ilya shmulevich

1616

Page 17: Insights from Boolean Modeling of Genetic Regulatory Networks ilya shmulevich

1717

Probabilistic Boolean Networks Probabilistic Boolean Networks (PBN)(PBN)

Share the appealing rule-based properties of Share the appealing rule-based properties of Boolean networks.Boolean networks.

Robust in the face of uncertainty.Robust in the face of uncertainty. Dynamic behavior can be studied in the context Dynamic behavior can be studied in the context

of Markov Chains.of Markov Chains.– Boolean networks are just special cases.Boolean networks are just special cases.

Close relationship to (dynamic) Bayesian Close relationship to (dynamic) Bayesian networksnetworks– Explicitly represent probabilistic relationships between Explicitly represent probabilistic relationships between

genes. genes. ((Lähdesmäki Lähdesmäki et alet al. (2006) . (2006) Sig. ProcSig. Proc., 86(4):814-., 86(4):814-834834))

– Can represent the same joint probability distribution.Can represent the same joint probability distribution. Allow quantification of influence of genes on Allow quantification of influence of genes on

other genes (stay tuned for examples)other genes (stay tuned for examples)Shmulevich et al. (2002) Proceedings of the IEEE, 90(11), 1778-1792.

Page 18: Insights from Boolean Modeling of Genetic Regulatory Networks ilya shmulevich

1919

Model Inference from Model Inference from Gene Expression DataGene Expression Data Two approaches:Two approaches:

– Coefficient of DeterminationCoefficient of Determination (Dougherty (Dougherty et al.et al. 2000) 2000)

– Best-Fit ExtensionsBest-Fit ExtensionsLähdesmäki et al. (2003) Machine Learning, 52, 147-167.

Page 19: Insights from Boolean Modeling of Genetic Regulatory Networks ilya shmulevich

2020

Coefficient of Coefficient of Determination (COD)Determination (COD) COD is used to discover associations COD is used to discover associations

between variables.between variables. It measures the degree to which the It measures the degree to which the

expression levels of an observed gene set expression levels of an observed gene set can be used to improve the prediction of can be used to improve the prediction of the expression of a target gene relative to the expression of a target gene relative to the best possible prediction in the absence the best possible prediction in the absence of observations.of observations.

Using the COD, one can find sets of genes Using the COD, one can find sets of genes related multivariately to a given target related multivariately to a given target gene.gene.

Page 20: Insights from Boolean Modeling of Genetic Regulatory Networks ilya shmulevich

2121

COD DefinitionCOD Definition

1ix

2ix

kix

ix Target geneObserved genes

Optimal Predictor

kiii xxxf ,,,

21

i

opti

i is the error of the best (constant) estimate of xi in the absence of any conditional variables

opt is the optimal error achieved by f

Page 21: Insights from Boolean Modeling of Genetic Regulatory Networks ilya shmulevich

2222

Constraints During Constraints During InferenceInference Constraining the class of predictors can Constraining the class of predictors can

have advantages:have advantages:– lessening the data requirements for reliable lessening the data requirements for reliable

estimation;estimation;– incorporating prior knowledge of the class of incorporating prior knowledge of the class of

functions representing genetic interactions;functions representing genetic interactions;– certain classes of functions are more certain classes of functions are more

plausible from the point of view of evolution, plausible from the point of view of evolution, noise resilience, network dynamics, etc.noise resilience, network dynamics, etc.

Page 22: Insights from Boolean Modeling of Genetic Regulatory Networks ilya shmulevich

2323

Example of Constraint: Example of Constraint: Post ClassesPost Classes

• The class is sufficiently large (this is important The class is sufficiently large (this is important for inference).for inference).

• An abundance of functions from this class will An abundance of functions from this class will tend to prevent chaotic behavior in networks.tend to prevent chaotic behavior in networks.

• Eukaryotic cells are not chaotic! Eukaryotic cells are not chaotic! (Shmulevich (Shmulevich et al. et al. (2005) (2005) PNASPNAS 102(38), 13439-13444.) 102(38), 13439-13444.)

• Functions from this class have a natural way to Functions from this class have a natural way to ensure robustness against noise and ensure robustness against noise and uncertainty.uncertainty.

Emil Post (1897-1954)Emil Post (1897-1954)

Shmulevich et al. (2003) PNAS 100(19), 10734-10739.

Page 23: Insights from Boolean Modeling of Genetic Regulatory Networks ilya shmulevich

2424

Post Class Constraints Post Class Constraints During InferenceDuring Inference

We compared the Post classes to the class of all We compared the Post classes to the class of all Boolean functions (i.e. no constraint) by estimating Boolean functions (i.e. no constraint) by estimating the corresponding prediction error for a set of the corresponding prediction error for a set of target genes, using available gene expression data.target genes, using available gene expression data.

We found that the optimal error of Post functions We found that the optimal error of Post functions compares favorably with optimal error without compares favorably with optimal error without constraint.constraint.

A hypothesis testing-based study gives no A hypothesis testing-based study gives no statistically significant evidence against the use of statistically significant evidence against the use of constrained function classes (i.e. cost of constraint).constrained function classes (i.e. cost of constraint).

Thus, Post classes are also plausible in light of Thus, Post classes are also plausible in light of experimental data.experimental data.

Page 24: Insights from Boolean Modeling of Genetic Regulatory Networks ilya shmulevich

2525

SubnetworksSubnetworksTheory and ExamplesTheory and Examples aim: discover relatively small aim: discover relatively small

subnetworkssubnetworks– whose genes interact significantly andwhose genes interact significantly and– whose genes are not strongly influenced whose genes are not strongly influenced

by genes outside the subnetwork.by genes outside the subnetwork. Principle of AutonomyPrinciple of Autonomy Start with a ‘seed’ gene set and Start with a ‘seed’ gene set and

iteratively adjoin new genes so as to iteratively adjoin new genes so as to enhance subnetwork autonomy.enhance subnetwork autonomy.

Page 25: Insights from Boolean Modeling of Genetic Regulatory Networks ilya shmulevich

2626

Growing AlgorithmGrowing Algorithm

To achieve network autonomy, both of these strengths of

connections should be high.

The sensitivity of Y from the outside should be small.

Various stopping criteria can be used

Hashimoto et al. (2004) Bioinformatics 20(8): 1241-1247.

Page 26: Insights from Boolean Modeling of Genetic Regulatory Networks ilya shmulevich

2727

Cancer tissues need nutrients. Gliomas are highly angiogenic.

Expression of VEGF is often elevated.

Page 27: Insights from Boolean Modeling of Genetic Regulatory Networks ilya shmulevich

2828

VEGF is elevated in advanced stage of gliomasConfirmation and localization by tissue microarray

Page 28: Insights from Boolean Modeling of Genetic Regulatory Networks ilya shmulevich

2929

VEGF protein is secreted outside the cells and binds to its receptor on the

endothelial cells to promote their growth.

Page 29: Insights from Boolean Modeling of Genetic Regulatory Networks ilya shmulevich

3030

GRB2GRB2

FGF7FGF7

FSHRFSHR

PTK7PTK7

VEGFVEGF Member of fibroblast growth factor family

Follicle-stimulating hormone receptor

Tyrosine kinase receptor

•The protein products of all four genes are part of signal transduction pathways that involve surface tyrosine kinase receptors.

•These receptors, when activated, recruit a number of adaptor proteins to relay the signal to downstream molecules

•GRB2 is one of the most crucial adaptors that have been identified.

•GRB2 is also a target for cancer intervention because of its link to multiple growth factor signal transduction pathways.

Page 30: Insights from Boolean Modeling of Genetic Regulatory Networks ilya shmulevich

3131

GRB2GRB2

GNB2GNB2

•Molecular studies have demonstrated that activation of protein tyrosine kinase receptor-GRB-2 complex activates ras-MAP kinase-NFB pathway to complete the signal relay from outside the cells to the nucleus.

•GNB2 is a ras family member.

MAP kinase 1MAP kinase 1

c-relc-rel•GNB2 influences MAP

kinase 1, which in turn influences c-rel, an NFB component.

Page 31: Insights from Boolean Modeling of Genetic Regulatory Networks ilya shmulevich

3232

Such relationships should also be Such relationships should also be validated experimentally.validated experimentally.

The networks built from our The networks built from our models provide valuable models provide valuable theoretical guidance for further theoretical guidance for further experiments.experiments.

Page 32: Insights from Boolean Modeling of Genetic Regulatory Networks ilya shmulevich

3333

•IGFBP2 is overexpressed in high-grade gliomas

•IGFBP2 contributes to increased cell invasion.

Page 33: Insights from Boolean Modeling of Genetic Regulatory Networks ilya shmulevich

3434

IGFBP2 is elevated in advanced stage of gliomasConfirmation and localization by tissue microarray

Page 34: Insights from Boolean Modeling of Genetic Regulatory Networks ilya shmulevich

3535

Vector

Low IGFBP2 clone

High IGFBP2 clone 1

High IGFBP2 clone 2

IGFBP2 promotes glioma cell invasion in vitro

Page 35: Insights from Boolean Modeling of Genetic Regulatory Networks ilya shmulevich

3636

A. Niemistö, L. Hu, O. Yli-Harja, W. Zhang, I. Shmulevich, "Quantification of in vitro cell invasion through image analysis," International Conference of the IEEE Engineering in Medicine and Biology Society (EMBS'04), San Francisco, California, USA, Sep. 1-5, 2004.

Page 36: Insights from Boolean Modeling of Genetic Regulatory Networks ilya shmulevich

3737

+1-561

c-Myc AP2 NFB

NFNFBB

IGFBP2IGFBP2 •A review of the literature showed that Cazals et al. (1999) indeed demonstrated that NFB activated the IGFBP2 promoter in lung alveolar epithelial cells.

Page 37: Insights from Boolean Modeling of Genetic Regulatory Networks ilya shmulevich

3838

•Higher NFB activity in IGFBP2 overexpressing cells was also found.

•Transient transfection of IGFBP2 expressing vector together with NFB promoter reporter gene construct did not lead to increased NFB activity, suggesting an indirect effect of IGFBP2 on NFB

NFNFBB

IGFBP2IGFBP2

TNFR2TNFR2

ILKILK•Our real-time PCR data showed that in stable IGFBP2-overexpressing cell lines, IGFBP2 indeed enhances ILK expression.

•In addition, IGFBP2 contains an RGD domain, implying its interaction with integrin molecules.

•ILK is in the integrin signal transduction pathway.

•Studies also showed that IGFBP2 affects cell apoptosis and TNFR2 is a known regulator of apoptosis

Page 38: Insights from Boolean Modeling of Genetic Regulatory Networks ilya shmulevich

3939

PBN web pagePBN web pagehttp://personal.systemsbiology.net/ilya/PBN/PBN.htm

•Reprints•Software (BN/PBN MATLAB

Toolbox)•Posters/Presentations•Workshops•Links•PBN People

Page 39: Insights from Boolean Modeling of Genetic Regulatory Networks ilya shmulevich

4040

PBN CollaboratorsPBN CollaboratorsWei Zhang

Harri LähdesmäkiOlli Yli-HarjaJaakko Astola

Edward DoughertyRonaldo Hashimoto

Marcel BrunSeungchan KimEdward SuhHuai LiMichael Bittner

SupportNIH/NIGMS R21 GM070600-01

NIH/NIGMS R01 GM072855-01

Page 40: Insights from Boolean Modeling of Genetic Regulatory Networks ilya shmulevich

Part IIPart II

Page 41: Insights from Boolean Modeling of Genetic Regulatory Networks ilya shmulevich

4242

Joint work withJoint work with

Page 42: Insights from Boolean Modeling of Genetic Regulatory Networks ilya shmulevich

4343

Order/ChaosOrder/Chaos

A broad body of work over the A broad body of work over the past 35 years has shown that a past 35 years has shown that a variety of model genetic variety of model genetic regulatory networks behave in regulatory networks behave in two broad regimes, two broad regimes, orderedordered and and chaoticchaotic, with an analytically and , with an analytically and numerically demonstrated phase numerically demonstrated phase transition between the two.transition between the two.

Page 43: Insights from Boolean Modeling of Genetic Regulatory Networks ilya shmulevich

4444

““Edge of chaos”Edge of chaos” The boundary between The boundary between orderorder and and chaoschaos is called is called

the the complex regimecomplex regime or the or the critical phasecritical phase..– The system can undergo a kind of phase transition.The system can undergo a kind of phase transition.– Networks are most evolvable at the “edge of chaos.”Networks are most evolvable at the “edge of chaos.”

Living system in a variable environment:Living system in a variable environment:– Strike a balance: Strike a balance: malleability vs. stabilitymalleability vs. stability– Must be stable, but not so stable that it remains Must be stable, but not so stable that it remains

forever static.forever static.– Must be malleable, but not so malleable that it is Must be malleable, but not so malleable that it is

fragile in the face of perturbations.fragile in the face of perturbations.

Page 44: Insights from Boolean Modeling of Genetic Regulatory Networks ilya shmulevich

4545

Plausible and long-standing Plausible and long-standing hypothesishypothesis: :

Real cells lie in the ordered regime Real cells lie in the ordered regime or are critical.or are critical.

“Life at the edge of chaos”

There has been no experimental data supporting this There has been no experimental data supporting this hypothesis.hypothesis.

Page 45: Insights from Boolean Modeling of Genetic Regulatory Networks ilya shmulevich

4646

Ordered Ordered networksnetworks HomeostasisHomeostasis A modest number of small recurrent A modest number of small recurrent

patterns of gene activity (attractors)patterns of gene activity (attractors)– plausible models of the diverse cell types (or plausible models of the diverse cell types (or

cell fates) of an organismcell fates) of an organism– the phenotypic traits of the organism are the phenotypic traits of the organism are

encoded in the dynamical attractors of its encoded in the dynamical attractors of its underlying genetic regulatory network underlying genetic regulatory network

Confined avalanches of gene activity Confined avalanches of gene activity changes following transient perturbations changes following transient perturbations in the activity of single genesin the activity of single genes– i.e. confined damage spreadingi.e. confined damage spreading

Page 46: Insights from Boolean Modeling of Genetic Regulatory Networks ilya shmulevich

4747

Chaotic Chaotic networksnetworks Nearby states lie on trajectories that divergeNearby states lie on trajectories that diverge

– hence, fail to exhibit a natural basis for hence, fail to exhibit a natural basis for homeostasishomeostasis

Have enormous attractors whose sizes scale Have enormous attractors whose sizes scale exponentially with the number of genesexponentially with the number of genes

Exhibit vast avalanches of gene activity Exhibit vast avalanches of gene activity alterations following transient perturbations alterations following transient perturbations to single gene activitiesto single gene activities

Page 47: Insights from Boolean Modeling of Genetic Regulatory Networks ilya shmulevich

4848

The model classThe model class

Random Boolean Networks (RBNs) - Random Boolean Networks (RBNs) - Kauffman (1969) Kauffman (1969) “ensemble approach”“ensemble approach”– One of the most intensively studied One of the most intensively studied

models of discrete dynamical systems.models of discrete dynamical systems.– Sustained interest from biology and Sustained interest from biology and

physics communities.physics communities.– Considered for many years as prototypes Considered for many years as prototypes

of nonlinear dynamical systems.of nonlinear dynamical systems. RBNs are:RBNs are:

– Structurally simple yet capable of Structurally simple yet capable of remarkably rich complex behavior!remarkably rich complex behavior!

Page 48: Insights from Boolean Modeling of Genetic Regulatory Networks ilya shmulevich

4949

ConnectivityConnectivity

However, it is also possible to let ki be random,chosen under various distributions.

(e.g. scale-free)

n

i iknK1

1Mean number of input variables

Page 49: Insights from Boolean Modeling of Genetic Regulatory Networks ilya shmulevich

5050

BiasBias The bias The bias pp of a random of a random

function is the probability function is the probability that it takes on the value 1.that it takes on the value 1.

If If p = p = 0.5, then the function is 0.5, then the function is unbiasedunbiased..

Page 50: Insights from Boolean Modeling of Genetic Regulatory Networks ilya shmulevich

5151

Connectivity, bias, and Connectivity, bias, and the phase transitionthe phase transition

0.750.50.25

20

15

10

5

0

p

K

p

K

CriticalPhase

112 pKp

Average Network

Sensitivity

Shmulevich & Kauffman (2004) Physical Review Letters, 93(4): 048701

Page 51: Insights from Boolean Modeling of Genetic Regulatory Networks ilya shmulevich

5252

Phase transitionPhase transition

RBNs can be tuned to undergo a RBNs can be tuned to undergo a phase transition byphase transition by– tuning the connectivity tuning the connectivity KK– tuning the bias tuning the bias pp– tuning the scale-free exponent tuning the scale-free exponent γγ

AldanaAldana & Cluzel (2003) & Cluzel (2003) PNAS, PNAS, 100(15):8710-4100(15):8710-4..

– tuning abundance of functional classestuning abundance of functional classes Shmulevich Shmulevich et alet al. (2003) . (2003) PNASPNAS 100(19):10734-9. 100(19):10734-9.

Page 52: Insights from Boolean Modeling of Genetic Regulatory Networks ilya shmulevich

5353

Our approachOur approach

Measure and compare the complexity of Measure and compare the complexity of time series data of HeLa cells with that of time series data of HeLa cells with that of mock data generated by RBNs operating in mock data generated by RBNs operating in the the orderedordered, , criticalcritical, and , and chaoticchaotic regimes. regimes.

We use the Lempel-Ziv (LZ) measure of We use the Lempel-Ziv (LZ) measure of complexity.complexity.

Dataset: Whitfield Dataset: Whitfield et alet al. (2002) . (2002) Mol. Biol. Mol. Biol. CellCell. 13, 1977-2000.. 13, 1977-2000.– synchronized HeLa cells; 48 time points at 1-synchronized HeLa cells; 48 time points at 1-

hour time intervals; 29,621 distinct geneshour time intervals; 29,621 distinct genes

Page 53: Insights from Boolean Modeling of Genetic Regulatory Networks ilya shmulevich

5454

01100101101100100110

Lempel-Ziv ComplexityLempel-Ziv Complexity

The algorithm parses the sequence into shortest words that have not occurred previously and the complexity is defined as the number of such words. Words are unique, except possibly the last one.

LZ Complexity = 7

01010101010101010101

LZ Complexity = 3

Page 54: Insights from Boolean Modeling of Genetic Regulatory Networks ilya shmulevich

5555

Lempel-Ziv Complexity Lempel-Ziv Complexity ExampleExample

0*1*10*010*1101*100100*110

LZ Complexity = 7

Page 55: Insights from Boolean Modeling of Genetic Regulatory Networks ilya shmulevich

5656

Lempel-Ziv Complexity: Lempel-Ziv Complexity: some remarkssome remarks ““Universal” complexity measureUniversal” complexity measure Basis of powerful lossless compression Basis of powerful lossless compression

schemes (ZIP, GIF, etc.)schemes (ZIP, GIF, etc.)– by replacing words with a pointer to a by replacing words with a pointer to a

previous occurrence of the same wordprevious occurrence of the same word Optimal: compression rate approaches Optimal: compression rate approaches

the entropy of the random sequencethe entropy of the random sequence Asymptotically Gaussian: can be used Asymptotically Gaussian: can be used

for statistical test of randomness.for statistical test of randomness.

Page 56: Insights from Boolean Modeling of Genetic Regulatory Networks ilya shmulevich

5757

IntuitionIntuition

Genes in Genes in orderedordered networks have networks have lowlow LZ complexities. LZ complexities.

Genes in Genes in chaoticchaotic networks have networks have highhigh LZ complexities. LZ complexities.

Page 57: Insights from Boolean Modeling of Genetic Regulatory Networks ilya shmulevich

5858

BinarizationBinarization

We used the well-known We used the well-known kk-means -means algorithm with two groups, algorithm with two groups, corresponding to the two binary values corresponding to the two binary values (0,1).(0,1).

Page 58: Insights from Boolean Modeling of Genetic Regulatory Networks ilya shmulevich

5959

2 3 4 5 6 7 8 9 10 11 12 13 140

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

LZ Complexity

Random

Data

Lempel-Ziv complexity distributions of binarized HeLa data vs. random binary data

Page 59: Insights from Boolean Modeling of Genetic Regulatory Networks ilya shmulevich

6060

HeLa time-series data

RBNBinarize

01101001101001101011 10011001100100110110

ordered

critical

chaotic

LZ complexities LZ complexities

2 3 4 5 6 7 8 910111213140

0.1

0.2

0.3

0.4

0.5thresholding permuted; = 1

2 3 4 5 6 7 8 910111213140

0.1

0.2

0.3

0.4

0.5k-means permuted

2 3 4 5 6 7 8 910111213140

0.1

0.2

0.3

0.4

0.5thresholding data; = 1

2 3 4 5 6 7 8 910111213140

0.1

0.2

0.3

0.4

0.5k-means data

2 3 4 5 6 7 8 910111213140

0.1

0.2

0.3

0.4

0.5thresholding data; = 2

2 3 4 5 6 7 8 910111213140

0.1

0.2

0.3

0.4

0.5thresholding data; = 3

2 3 4 5 6 7 8 910111213140

0.1

0.2

0.3

0.4

0.5thresholding permuted; = 1

2 3 4 5 6 7 8 910111213140

0.1

0.2

0.3

0.4

0.5k-means permuted

2 3 4 5 6 7 8 910111213140

0.1

0.2

0.3

0.4

0.5thresholding data; = 1

2 3 4 5 6 7 8 910111213140

0.1

0.2

0.3

0.4

0.5k-means data

2 3 4 5 6 7 8 910111213140

0.1

0.2

0.3

0.4

0.5thresholding data; = 2

2 3 4 5 6 7 8 910111213140

0.1

0.2

0.3

0.4

0.5thresholding data; = 3

Compute distanceFind minimum

(29,621 genes by 48 time points)29,621 genes by 48 time points)

Page 60: Insights from Boolean Modeling of Genetic Regulatory Networks ilya shmulevich

6161

Distance between LZ Distance between LZ distributionsdistributions

m

i iii qppqpD1

)/log(,

Kullback-Leibler (KL) distanceKullback-Leibler (KL) distance

Euclidean distanceEuclidean distance

2/1

1

2,

m

i ii qpqpE

Page 61: Insights from Boolean Modeling of Genetic Regulatory Networks ilya shmulevich

6262

Three techniques to tune Three techniques to tune orderedordered, , criticalcritical, and , and chaoticchaotic regimes. regimes.

1.1. Fix Fix pp = 0.5, let = 0.5, let KK = = 11, , 22, , 33, , 44..

2.2. Fix Fix KK = 4, let = 4, let pp = = 0.933010.93301, , 0.853550.85355, , 0.750.75, , 0.50.5..

3.3. Scale-free topology with Scale-free topology with connectivity connectivity K(K(γγ)). Vary scale-free . Vary scale-free exponent exponent γγ such that average such that average network sensitivity is equal to the network sensitivity is equal to the cases above. cases above. ((AldanaAldana & Cluzel (2003) & Cluzel (2003) PNAS, PNAS, 100(15):8710-4100(15):8710-4))

Page 62: Insights from Boolean Modeling of Genetic Regulatory Networks ilya shmulevich

6363

But what about noise?But what about noise? Wouldn’t noise make things look more Wouldn’t noise make things look more

chaotic?chaotic? There are two issues:There are two issues:

– In the binary domain, the compound effect of noise In the binary domain, the compound effect of noise amounts to a certain percentage of values in the amounts to a certain percentage of values in the time series data being flipped from zero to one or time series data being flipped from zero to one or vice versa.vice versa.

– Many genes are expressed at levels that are below Many genes are expressed at levels that are below those corresponding to pure noise.those corresponding to pure noise.

Fortunately, using the HeLa data, it is possible Fortunately, using the HeLa data, it is possible to estimate both the binary noise probability to estimate both the binary noise probability and the global “noise floor” level as follows.and the global “noise floor” level as follows.

Page 63: Insights from Boolean Modeling of Genetic Regulatory Networks ilya shmulevich

6464

Estimate the “noise Estimate the “noise floor”floor” There are 963 empty spots on the HeLa There are 963 empty spots on the HeLa

microarrays. microarrays. As a conservative estimate, for each of As a conservative estimate, for each of

the 48 microarrays, we used the 95th the 48 microarrays, we used the 95th percentile of the values of the empty percentile of the values of the empty spots as the noise floor level for that spots as the noise floor level for that array.array.

Only those genes whose values exceed Only those genes whose values exceed this global threshold at all time points are this global threshold at all time points are included for further analysis.included for further analysis.– Hence our criteria are very stringent. Hence our criteria are very stringent.

Page 64: Insights from Boolean Modeling of Genetic Regulatory Networks ilya shmulevich

6565

Estimate the noise probability Estimate the noise probability qq

We made use of the replicated probes We made use of the replicated probes available on the arrays. available on the arrays. – 2001 duplicate gene profiles of 48 time 2001 duplicate gene profiles of 48 time

points.points. Keeping only those that exceeded the Keeping only those that exceeded the

global threshold, we binarized each of global threshold, we binarized each of the duplicate profiles and computed the duplicate profiles and computed the normalized Hamming distance.the normalized Hamming distance.

35.0ˆ q with a 95% bootstrap confidence interval of [0.32, 0.38].

Page 65: Insights from Boolean Modeling of Genetic Regulatory Networks ilya shmulevich

6666

Euclidean (fix Euclidean (fix p p = 0.5, tune = 0.5, tune KK))

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.50

0.2

0.4

0.6

0.8

1

1.2

1.4

q

Euclid

ean d

ista

nce(a)

K = 1K = 2K = 3K = 4

Shmulevich et al. (2005) PNAS 102(38):13439.

Page 66: Insights from Boolean Modeling of Genetic Regulatory Networks ilya shmulevich

6767

Kullback-Leibler (fix Kullback-Leibler (fix p p = 0.5, tune = 0.5, tune KK))

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.50

2

4

6

8

10

12

14

16

18

20

q

Kullback-L

eib

ler dis

tance

(b)

K = 1K = 2K = 3K = 4

Shmulevich et al. (2005) PNAS 102(38):13439.

Page 67: Insights from Boolean Modeling of Genetic Regulatory Networks ilya shmulevich

6868

Euclidean (fix Euclidean (fix KK = 4, tune = 4, tune pp))

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.50

0.2

0.4

0.6

0.8

1

(c)

q

Euclidean d

ista

nce

p = 0.5p = 0.75p = 0.85355p = 0.93301

Shmulevich et al. (2005) PNAS 102(38):13439.

Page 68: Insights from Boolean Modeling of Genetic Regulatory Networks ilya shmulevich

6969

Kullback-Leibler (fix Kullback-Leibler (fix KK = 4, tune = 4, tune pp))

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.50

2

4

6

8

10

12

14

16

18

20

q

Kullback-L

eib

ler dis

tance

(d)

p = 0.5p = 0.75p = 0.85355p = 0.93301

Shmulevich et al. (2005) PNAS 102(38):13439.

Page 69: Insights from Boolean Modeling of Genetic Regulatory Networks ilya shmulevich

7070

Euclidean, Scale-free (tune Euclidean, Scale-free (tune γγ))

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.50

0.2

0.4

0.6

0.8

1

1.2

1.4

q

Eu

clid

ea

n d

ista

nce

0.34 0.345 0.35 0.355 0.360.1

0.15

0.2

0.25 K = 1K = 2K = 3K = 4K = 5

Average sensitivityequivalent to

Shmulevich et al. (2005) PNAS 102(38):13439.

Page 70: Insights from Boolean Modeling of Genetic Regulatory Networks ilya shmulevich

7171

Kullback-Leibler, Scale-free (tune Kullback-Leibler, Scale-free (tune γγ))

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.50

2

4

6

8

10

12

14

16

18

20

q

Ku

llba

ck-L

eib

ler

dis

tan

ce

0.34 0.345 0.35 0.355 0.360.4

0.6

0.8

1 K = 1K = 2K = 3K = 4K = 5

Average sensitivityequivalent to

Shmulevich et al. (2005) PNAS 102(38):13439.

Page 71: Insights from Boolean Modeling of Genetic Regulatory Networks ilya shmulevich

7272

Concluding remarksConcluding remarks

The results strongly suggest that HeLa cells The results strongly suggest that HeLa cells are in the ordered regime or are critical, but are in the ordered regime or are critical, but not chaoticnot chaotic..

We cannot statistically distinguish between We cannot statistically distinguish between ordered and critical with these data.ordered and critical with these data.

Critical networks appear to predict the Critical networks appear to predict the distribution of genes whose activities are distribution of genes whose activities are altered in several hundred knock-out mutants altered in several hundred knock-out mutants of yeast. of yeast. (Serra (Serra et alet al. (2004) . (2004) J. Theor. BiolJ. Theor. Biol. . 227, 149-157)227, 149-157)

It will be important to use more realistic It will be important to use more realistic ensembles of model genetic networks to test ensembles of model genetic networks to test whether our conclusions hold.whether our conclusions hold.