Exploring behaviors of stochastic differential equation models of biological systems using change of measures

RESEARCH Open Access

Exploring behaviors of stochastic differentialequation models of biological systems usingchange of measuresSumit Kumar Jha1 Christopher James Langmead23

From First IEEE International Conference on Computational Advances in Bio and medical Sciences (ICCABS2011)Orlando FL USA 3-5 February 2011

Abstract

Stochastic Differential Equations (SDE) are often used to model the stochastic dynamics of biological systemsUnfortunately rare but biologically interesting behaviors (eg oncogenesis) can be difficult to observe in stochasticmodels Consequently the analysis of behaviors of SDE models using numerical simulations can be challengingWe introduce a method for solving the following problem given a SDE model and a high-level behavioralspecification about the dynamics of the model algorithmically decide whether the model satisfies the specificationWhile there are a number of techniques for addressing this problem for discrete-state stochastic models theanalysis of SDE and other continuous-state models has received less attention Our proposed solution uses acombination of Bayesian sequential hypothesis testing non-identically distributed samples and Girsanovrsquos theoremfor change of measures to examine rare behaviors We use our algorithm to analyze two SDE models of tumordynamics Our use of non-identically distributed samples sampling contributes to the state of the art in statisticalverification and model checking of stochastic models by providing an effective means for exposing rare events inSDEs while retaining the ability to compute bounds on the probability that those events occur

BackgroundThe dynamics of biological systems are largely driven bystochastic processes and subject to random external per-turbations The consequences of such random processesare often investigated through the development and ana-lysis of stochastic models (eg [1-4]) Unfortunately thevalidation and analysis of stochastic models can be verychallenging [56] especially when the model is intendedto investigate rare but biologically significant behaviors(eg oncogenesis) The goal of this paper is to introducean algorithm for examining such rare behaviors in Sto-chastic Differential Equation (SDE) models The algo-rithm takes as input the SDE model M and a high-

level description of a dynamical behavior j (eg a for-mula in temporal logic) It then computes a statisticallyrigorous bound on the probability that the given modelexhibits the stated behavior using a combination ofbiased sampling and Bayesian Statistical Model Check-ing [78]Existing methods for validating and analyzing stochas-

tic models often require extensive Monte Carlo sam-pling of independent trajectories to verify that themodel is consistent with known data and to character-ize the modelrsquos expected behavior under various initialconditions Sampling strategies are either unbiased orbiased Unbiased sampling strategies draw trajectoriesaccording to the probability distribution implied by themodel and are thus not well-suited to investigating rarebehaviors For example if the actual probability that themodel will exhibit a given behavior is 10-10 then theexpected number of samples need to see such behaviorsis about 1010 (See Figure 1) Biased sampling strategies

Correspondence jhaeecsucfedu cjlcscmuedu1Electrical Engineering and Computer Science Department University ofCentral Florida Orlando FL 32816 USA2Computer Science Department Carnegie Mellon University Pittsburgh PA15213 USAFull list of author information is available at the end of the article

Jha and Langmead BMC Bioinformatics 2012 13(Suppl 5)S8httpwwwbiomedcentralcom1471-210513S5S8

copy 2012 Jha and Langmead licensee BioMed Central Ltd This is an Open Access article distributed under the terms of the CreativeCommons Attribution License (httpcreativecommonsorglicensesby20) which permits unrestricted use distribution andreproduction in any medium provided the original work is properly cited

in contrast can be used to increase the probability ofobserving rare events at the expense of distorting theunderlying probability distributionmeasure If thechange of measure is well-characterized (as in impor-tance sampling) these distortions can be correctedwhen computing properties of the distributionOur method uses a combination of biased sampling

and sequential hypothesis testing [910] to estimate theprobability that the model satisfies the property Brieflythe algorithm randomly perturbs the computationalmodel prior to generating each sample in order to exposerare but interesting behaviors These perturbations causea change of measure We note that in the context ofSDEs a change of measure is itself a stochastic processand so importance sampling which assumes that thechange of measure with respect to the biased distributionis known cannot be used Our technique does notrequire us to know the exact magnitudes of the changesof measure nor the Radon-Nikodym derivatives [11]Instead it ensures that that the geometric average ofthese derivatives is bounded This is a much weakerassumption than is required by importance sampling butwe will show that it is sufficient for the purposes ofobtaining bounds via sequential hypothesis testing

Related workOur method performs statistical model checking usinghypothesis testing [1213] which has been used pre-viously to analyze in a variety of domains (eg[14-1919]) including computational biology (eg

[7820-22]) There are two key differences between themethod in this paper and existing methods First exist-ing methods generate independently and identically dis-tributed (iid) samples Our method in contrastgenerates independent but non-identically distributed(non-iid) samples It does so to expose rare behaviorsSecond whereas most existing methods for statisticalmodel checking use classical statistics our methodemploys Bayesian statistics [2324] We have previouslyshown that Bayesian statistics confers advantages in thecontext of statistical verification in terms of efficiencyand flexibility [781925]

MethodsOur method draws on concepts from several differentfields We begin by briefly surveying the semantics ofstochastic differential equations a language for formallyspecifying dynamic behaviors Girsanovrsquos theorem onchange of measures and results on consistency and con-centration of Bayesian posteriors

Stochastic differential equation modelsA stochastic differential equation (SDE) [2627] is a dif-ferential equation in which some of the terms evolveaccording to Brownian Motion [28] A typical SDE is ofthe following form

dX = b(t Xt)dt + v(t Xt)dWt

where X is a system variable b is a Riemann integr-able function v is an Itō integrable function and W is

Figure 1 Observing rare behaviors in iid sampling is challenging A toy model with a one low-probability state An unbiased samplingalgorithm may require billions of samples in order to observe the lsquobadrsquo state Statistical algorithms based on iid algorithms are not suitable foranalyzing such models with rare interesting behaviors

Brownian Motion The Brownian Motion W is a contin-uous-time stochastic process satisfying the followingthree conditions

1 W0 = 02 Wt is continuous (almost surely)3 Wt has independent normally distributedincrements

bull Wt - Ws and Wtrsquo - Wsrsquo are independent if 0 le sltt ltsrsquo lttrsquobull Wt minusWs sim N (0 t minus s) where N (0 t minus s)denotes the normal distribution with mean 0 andvariance t - s Note that the symbol ~ is used toindicate ldquois distributed asldquo

Consider the time between 0 and t as divided into mdiscrete steps 0 t1 t2 tm = t The solution to a sto-chastic differential equation is the limit of the followingdiscrete difference equation as m goes to infinity

Xtk+1 minus Xtk = b(tk Xtk ) (tk+1 minus tk)

+ v(tk Xtk )(Wtk+1 minusWtk )

In what follows M will refer to a system of stochas-tic differential equations We note that a system of sto-chastic differential equations comes equipped with aninherent probability space and a natural probabilitymeasure μ Our algorithm repeatedly and randomly per-turbs the probability measure of the Brownian motionin the model M which in turn changes the underlyingmeasure in an effort to expose rare behaviors Thesechanges can be characterized using Girsanovrsquos TheoremGirsanovrsquos theorem for perturbing stochastic differentialequation modelsGiven a process θt | 0 le t le T satisfying the Novikovcondition [29] such as an SDE the following exponen-tial martingale Zt defines the change from measure P tonew measure P

Zt = exp(minusint t

0θudWuminus

u du2)

Here Zt is the Radon-Nikodym derivative of P withrespect to P for t ltT The Brownian motion Wt under

P is given by Wt = Wt +int t

0 θudu The non-stochasticcomponent of the stochastic differential equation is notaffected by change of measures Thus a change of mea-sure for SDEs is a stochastic process (unlike importancesampling for explicit probability distributions)

Specifying dynamic behaviorsNext we define a formalism for encoding high-levelbehavioral specification that our algorithm will testagainst M

Definition 1 (Adapted Finitely Monitorable) Let s bea finite-length trace from the stochastic differentialequation M A specification j is said to be adaptedfinitely monitorable (AFM) if it is possible to decidewhether s satisfies j denoted j ⊨ jCertain AFM specifications can be expressed as for-

mulas in Bounded Linear Temporal Logic (BLTL)[30-32] Informally BLTL formulas can capture theordering of eventsDefinition 2 (Probabilistic Adapted Finitely Monitor-

able) A specification j is said to be probabilisticadapted finitely monitorable (PAFM) if it is possible to(deterministically or probabilistically) decide whetherM satisfies j with probability at least θ denotedM Pgeθ (φ) Some common examples of PAFM specifications

include Probabilistic Bounded Linear Temporal Logic(PBLTL) (eg see [19]) and Continuous SpecificationLogic Note that temporal logic is only one means forconstructing specifications other formalisms can also beused like Statecharts [3334]Semantics of bounded linear temporal logic (BLTL)We define the semantics of BLTL with respect to thepaths of M Let s = (s0 Δ0) (s1 Δ1) be a sampledexecution of the model along states s0 s1 with dura-tions Δ0 Δ1 Icirc ℝ We denote the path starting atstate i by si (in particular s0 denotes the originalexecution s) The value of the state variable x in s atthe state i is denoted by V (s i x) The semantics ofBLTL is defined as follows

1 sk ⊨ x ~ v if and only if V(s k x) ~ v where v Icircℝ and ~ Icirc gt lt =2 sk ⊨ j1 orj2 if and only if sk ⊨ j1 or sk ⊨ j23 sk ⊨ j1 and j2 if and only if sk ⊨ j1 and sk ⊨ j24 sk ⊨ notj1 if and only if sk ⊨ j1 does not hold5 sk ⊨ j1U

tj2 if and only if there exists i Icirc N suchthat (a) 0 le Σ0 le l ltiΔk+lle t (b) sk+i⊨ j2 and (c) foreach 0 le j lt i sk+j⊨ j1

Statistical model validationOur algorithm performs statistical model checking usingBayesian sequential hypothesis testing [23] on non-iidsamples The statistical model checking problem is todecide whether a model M satisfies a probabilisticadapted finitely monitorable formula j with probabilityat least θ That is whether M Pgeθ (φ) where θ Icirc (01)Sequential hypothesis testingLet r be the (unknown but fixed) probability of themodel satisfying j We can now re-state the statisticalmodel checking problem as deciding between the twocomposite hypotheses H0 r ≽ θ and H1 r lt θ Here

the null hypothesis H0 indicates that M satisfies theAFM formula j with probability at least θ while thealternate hypothesis H1 denotes that M satisfies theAFM formula j with probability less than θDefinition 3 (Type I and II errors) A Type I error is

an error where the hypothesis test asserts that the nullhypothesis H0 is false when in fact H0 is true Conver-sely a Type II error is an error where the hypothesistest asserts that the null hypothesis H0 is true when infact H0 is falseThe basic idea behind any statistical model checking

algorithm based on sequential hypothesis testing is toiteratively sample traces from the stochastic processEach trace is then evaluated with a trace verifier [32]which determines whether the trace satisfies the specifi-cation ie si ⊨ j This is always feasible because the spe-cifications used are adapted and finitely monitorableTwo accumulators keep track of the total number oftraces sampled and the number of satisfying tracesrespectively The procedure continues until there isenough information to reject either H0 or H1Bayesian sequential hypothesis testingRecall that for any finite trace si of the system and anAdapted Finitely Monitorable (AFM) formula j we candecide whether si satisfies j Therefore we can define arandom variable Xi denoting the outcome of si ⊨ jThus Xi is a Bernoulli random variable with probabilitymass function f (xi|ρ) = ρxi(1minus ρ)1minusxi where xi = 1 ifand only if si ⊨q j otherwise xi = 0Bayesian statistics requires that prior probability distri-

butions be specified for the unknown quantity which isr in our case Thus we will model the unknown quan-tity as a random variable u with prior density g(u) Theprior probability distribution is usually based on ourprevious experiences and beliefs about the system Non-informative or objective prior probability distribution[35] can be used when nothing is known about theprobability of the system satisfying the AFM formulaSuppose we have a sequence of independent random

variables X1 Xn defined as above and let d = (x1xn) denote a sample of those variablesDefinition 4 The Bayes factor B of sample d and

hypotheses H0 and H1 is B =P(d|H0)P(d|H1)

The Bayes factor may be used as a measure of relativeconfidence in H0 vs H1 as proposed by Jeffreys [36]The Bayes factor is the ratio of two likelihoods

int 1θ

f (x1|u) middot middot middot f (xn|u) middot g(u)duint θ

0 f (x1|u) middot middot middot f (xn|u) middot g(u)du (1)

We note that the Bayes factor depends on both thedata d and on the prior g so it may be considered a

measure of confidence in H0 vs H1 provided by the datax1 xn and ldquoweightedrdquo by the prior gNon-iid Bayesian sequential hypothesis testingTraditional methods for hypothesis testing includingthose outlined in the previous two subsections assumethat the samples are drawn iid In this section we showthat non-iid samples can also be used provided that cer-tain conditions hold In particular if one can bound thechange in measure associated with the non-identical sam-pling one can can also bound the Type I and Type IIerrors under a change of measure Our algorithm boundsthe change of measure and thus the errorWe begin by reviewing some fundamental concepts

from Bayesian statistics including KL divergence KLsupport affinity and δ-separation and then restate animportant result on the concentration of Bayesian pos-teriors [3537]Definition 5 (Kullback-Leibler (KL) Divergence) Given

a parameterized family of probability distributions fθthe Kullback-Leibler (KL) divergence K(θ0 θ) betweenthe distributions corresponding to two parameters θ andθ0 is K (θ0 θ) = Eθ0 [fθ fθ0

] Note that Eθ0 is the expecta-tion computed under the probability measure fθ0 Definition 6 (KL Neighborhood) Given a parameter-

ized family of probability distributions fθ the KLneighborhood Kε (θ0) of a parameter value θ0 is given bythe set θ K (θ0 θ) lt εDefinition 7 (KL Support) A point θ0 is said to be in

the KL support of a prior Π if and only if for all ε gt0 Π(Kε (θ0)) gt0Definition 8 (Affinity) The affinity Aff(f g) between

any two densities is defined as Aff(f g) =int radic

fgdμ Definition 9 (Strong δ-Separation) Let A sub [0 1] and

δ gt0 The set A and the point θ0 are said to be stronglyδ-separated if and only if for any proper probability dis-tribution v on A Aff(fθ0

intA fθ(x1)v(dθ)) lt δ

Given these definitions it can be shown that the Baye-sian posterior concentrates exponentially under certaintechnical conditions [3537]Bounding errors under a change of measureNext we develop the machinery needed to computebounds on the Type-IType-II errors for a testing strat-egy based on non-iid samplesA stochastic differential equation model M is natu-

rally associated with a probability measure μ Our non-iid sampling strategy can be thought of as the assign-ment of a set of probability measures μ1 μ2 to M Each unique sample si is associated with an impliedprobability measure μi and is generated from M underμi in an iid manner Our proofs require that all theimplied probability measures are equivalent That is anevent is possible (resp impossible) under a probabilitymeasure if and only if it is possible (resp impossible)under the original probability measure μ

We use the following result regarding change of mea-sures Suppose a given behavior say j holds on the ori-ginal model with an (unknown) probability r

P(ρ lt θ0|Xi) =

int θ0

0 pμ(Xi|u) g(u) duint 10 pμ(Xi|u) g(u) du

Here Xi is a Bernoulli random variable denoting theevent that ith sample satisfies the given behavior j Notethat the Xi s must be independent of one another Nowwe can rewrite the above expression as

P (ρ lt θ0|Xi) =

int θ0

pμ(Xi|u)pμi (Xi|u)

pμi (Xi|u)g(u) du

int 10

pμi (Xi|u)g(u) du

Note that the term pμi (Xi|u)denotes the probability ofobserving the event Xi under the modified probabilitymeasure μi if the unknown probability r were u Inorder to ensure the independence assumption the newprobability measures μi are chosen independently of oneanother The ratio

pμi (Xi|u)pμ(Xi|u)

is the implied Radon-Niko-dym derivative for the change of measure between twoequivalent probability measures Suppose the testingstrategy has made n observations X1 X2 Xn Then

P (ρ lt θ0|X) =

int θ0

nprodi=1

pμi (Xi|u) g(u) du

int 10

nprodi=1

pμi (Xi|u) g(u) du

A sampling algorithm can compute pμi (Xi|u) bydrawing independent samples from a stochastic differen-tial equation model under the new ldquomodifiedrdquo probabil-ity measure We note that it is not easy to compute thechange of measure pμi (Xi|u)

pμ(Xi|u) algebraically or numericallyHowever our algorithm does not need to compute thisquantity explicitly It simply establishes bounds on itConsider the following expression that is computable

without knowing the implied Radon-Nikodym derivativeor change of measure explicitly

Q (ρ lt θ0|Xi) =

int θ0

0 pμi (Xi|u) g(u) duint 10 pμi (Xi|u) g(u) du

Now we can rewrite the above expression as

Q (ρ lt θ0|Xi) =

int θ0

pμ(Xi|u) g(u) du

int 10

pμ(Xi|u) g(u) du

Our result will exploit the fact that we do not allowour testing or sampling procedures to have arbitraryimplied Radon-Nikodym derivatives This is reasonableas no statistical guarantees should be available for anintelligently designed but adversarial test procedure that(say) tries to avoid sampling from the given behaviorSuppose that the implied Radon-Nikodym derivativealways lies between a constant c and another constant1c That is the change of measure does not distort theprobabilities of observable events by more than a factorof c Then we observe that

Q (ρ lt θ0|Xi) leint θ0

0 cpμ(Xi|u) g(u) duint 1

pμ(Xi|u) g(u) du

= c2P(ρ lt θ0|Xi)

Furthermore

Q (ρ lt θ0|Xi) ge 1c2

P (ρ lt θ0|Xi)

Thus by allowing the sampling algorithm to changemeasures by at most c we have changed the posteriorprobability of observing a behavior by at most c2Example Suppose the testing strategy has made n

observations X1 X2 Xn Then

Q (ρ lt θ0|X) =

int θ0

nprodi=1

pμ(Xi|u) g(u) du

int 10

nprodi=1

pμ(Xi|u) g(u) du

int θ0

0 cnnprod

i=1(pμ(Xi|u)) g(u) du

int 10

nprodi=1

(pμ(Xi|u)) g(u) du

= c2nP (ρ lt θ0|X1 X2 Xn)

Similarly

Q (ρ lt θ0|X) ge 1c2n

int θ0

nprodi=1

(pμ(Xi|u)) g(u) du

int 10

nprodi=1

(pμ(Xi|u)) g(u) du

c2nP (ρ lt θ0|X1 Xn)

Termination conditions for non-iid samplingTraditional (ie iid) Bayesian Sequential HypothesisTesting is guaranteed to terminate That is only a finitenumber of samples are required before the test selectsone of the hypotheses We now consider the conditionsunder which a Bayesian Sequential Hypothesis Testingbased procedure using non-iid samples will terminate

To do this we first need to show that the posteriorprobability distribution will concentrate on a particularvalue as we see more an more samples from the modelTo consider the conditions under which our algorithm

will terminate after observing n samples note that thefactor introduced due to the change of measure c2n canoutweigh the gain made by the concentration of theprobability measure e-nb This is not surprising becauseour construction thus far does not force the test not tobias against a sample in an intelligent way That is amaliciously designed testing procedure could simplyavoid the error prone regions of the design To addressthis we define the notion of a fair testing strategy thatdoes not engage in such malicious samplingDefinition 10 A testing strategy is h-fair (h ge 1) if

and only if the geometric average of the implied Radon-Nikodym derivatives over a number of samples is withina constant factor h of unity ie

1ηle n

radicradicradicradic nprodi=1

Note that a fair test strategy does not need to samplefrom the underlying distribution in an iid mannerHowever it must guarantee that the probability ofobserving the given behavior in a large number ofobservations is not altered substantially by the non-iidsampling Intuitively we want to make sure that we biasfor each sample as many times as we bias against itOur main result shows that such a long term neutralityis sufficient to generate statistical guarantees on anotherwise non-iid testing procedureDefinition 11 An h-fair test is said to be eventually

fair if and only if 1 le h4 lteb where b is the constant inthe exponential posterior concentration theoremThe notion of a eventually fair test corresponds to a

testing strategy that is not malicious or adversarial andis making an honest attempt to sample from all theevents in the long run

AlgorithmFinally we present our Statistical Verification algo-rithm (See Figure 2) in terms of a generic non-iidtesting procedure sampling with random ldquoimpliedrdquochange of measures Our algorithm is relatively simpleand generalizes our previous Bayesian Statistical verifi-cation algorithm [8] to non-iid samples using changeof measures The algorithm draws non-iid samplesfrom the stochastic differential equation under ran-domly chosen probability measures The algorithmensures that the implied change of measure is boundedso as to make the testing approach fair The variable ndenotes the number of samples obtained so far and x

denoted the number of samples that satisfy the AFMspecification j Based upon the samples observed wecompute the Bayes Factor under the new probabilitymeasures We know that the Bayes Factor so computedis within a factor of the original Bayes Factor underthe natural probability measure Hence the algorithmdivides the Bayes Factor by the factor h2n if the BayesFactor is larger than one If the Bayes Factor is lessthan one the algorithm multiplies the Bayes Factor bythe factor h2n

Results and discussionWe applied our algorithm to two SDE models of tumordynamics from the literature The first model is a singledimensional stochastic differential equation for theinfluence of chemotherapy on cancer cells and the sec-ond model is a pair of SDEs that describe an immuno-genic tumor

Lefever and Garay modelLefever and Garay [38] studied the growth of tumorsunder immune surveillance and chemotherapy using thefollowing stochastic differential equation

= r0x (1minus xK

)minus βx2

1 + x2+ x(1minus x

K) A0cos (ωt)

+ x (1minus xK

Here x is the amount of tumor cells A0cos(ωt)denotes the influence of a periodic chemotherapy treat-ment r0 is the linear per capita birth rate of cancercells K is the carrying capacity of the environment andb represents the influence of the immune cells Notethat Wt is the standard Brownian Motion and defaultmodel parameters were those used in [38]We demonstrate our algorithm on a simple property

of the model Namely starting with a tumor consistingof a billion cells is there at least a 1 chance that thetumor could increase to one hundred billion cells underunder immune surveillance and chemotherapy The fol-lowing BLTL specification captures the behavioral speci-fication

Prge001(F10(xgt 1011))

Figure 3 contrasts the number of samples needed todecide whether the model satisfies the property using iid and non-iid As expected the number of samplesrequired increases linearly in the logarithm of the Bayesfactor regardless of whether iid or non-iid samplingis used However non-iid sampling always requiresfewer samples than iid sampling Moreover the differ-ence between the number of samples increases with theBayes factor That is the lines are diverging

Figure 2 Non-iid Statistical Verification Algorithm The figure illustrated the non-iid Bayesian model validation algorithm The algorithmbuilds upon Girsanovrsquos theorem on change of measure and Bayesian model validation

Figure 3 Comparison of iid and non-iid sampling Non-iid vs iid sampling based verification for the Lefever and Garay model

We note that there are circumstances when our algo-rithm may require more samples than one based on iid sampling This will happen when the property beingtested has relatively high probability For example wetested the property that the probability of eradicatingthe tumor is at most 1 The iid algorithm required1374 samples to deny this possibility while the non-iidalgorithm required 1526 samples Thus our algorithm isbest used to examine rare behaviors

Nonlinear immunogenic tumor modelThe second model we analyze studies immunogenictumor growth [3940] Unlike the previous model theimmunogenic tumor model explicitly tracks thedynamics of the immune cells (variable x) in response tothe tumor cells (variable y) The SDEs are as follows

dx(t) = (a1 minus a2x (t) + a3x (t)y(t)) dt+ (b11(x(t) minus x1) + b12(y(t)minus y1)) dW1(t)

dy(t) = (b1y(t) (1minus b2y(t))minus x(t)y(t)) dt+ (b21(x(t) minus x1) + b22(y(t)minus y1)) dW2(t)

The parameters x1 an y1 denote the stochastic equili-brium point of the model Briefly the model assumesthat the amount of noise increases with the distance tothe equilibrium point

For this model we considered the following propertystarting from 01 units each of tumor and immune cellsis there at least a 1 chance that the number of tumorcells could increase to 33 units The property can beencoded into the following BLTL specification

Prge001(F10(x gt 33))

Default model parameters were those used in [39] Fig-ure 4 contrasts the number of samples needed to decidewhether the model satisfies the property using iid andnon-iid The same trends are observed as in the pre-vious model That is our algorithm requires fewer sam-ples than iid hypothesis testing and that the differencebetween these methods increases with the Bayes factorWe also considered the property that the number of

tumor cells increases to 40 units We evaluated whetherthis property is true with probability at least 0000005under a Bayes Factor of 100 000 The iid samplingalgorithm did not produce an answer even after obser-ving 10 000 samples The non-iid model validationalgorithm answered affirmatively after observing 6 736samples Once again the real impact of the proposedalgorithm lies in uncovering rare behaviors and bound-ing their probability of occurrence

Figure 4 Comparison of iid and non-iid sampling Non-iid vs iid Sampling based verification for the nonlinear Immunogenic tumormodel

DiscussionOur results confirm that non-iid sampling reduces thenumber of samples required in the context of hypothesistesting ndash when the property under consideration is rareMoreover the benefits of non-iid sampling increasewith the rarity of the property as confirmed by thedivergence of the lines in Figures 2 and 3 We notehowever that if the property isnrsquot rare then a non-iidsampling strategy will actually require a larger numberof samples than an iid strategy Thus our algorithm isonly appropriate for investigating rare behaviors

ConclusionsWe have introduced the first algorithm for verifyingproperties of stochastic differential equations usingsequential hypothesis testing Our technique combinesBayesian statistical model checking and non-iid sam-pling and provides guarantees in terms of terminationand the number of samples needed to achieve thosebounds The method is most suitable when the behaviorof interest is the exception and not the normThe present paper only considers SDEs with indepen-

dent Brownian noise We believe that these results canbe extended to handle SDEs with certain kinds of corre-lated noise Another interesting direction for futurework is the extension of these method to stochastic par-tial differential equations which are used to model spa-tially inhomogeneous processes Such analysis methodscould be used for example to investigate propertiesconcerning spatial properties of tumors the propagationof electrical waves in cardiac tissue or more generallyto the diffusion processes observed in nature

AcknowledgementsThe authors acknowledge the feedback received from the anonymousreviewers for the first IEEE Conference on Compuational Advances in Bioand Medical Sciences (ICCABS) 2011This article has been published as part of BMC Bioinformatics Volume 13Supplement 5 2012 Selected articles from the First IEEE InternationalConference on Computational Advances in Bio and medical Sciences(ICCABS 2011) Bioinformatics The full contents of the supplement areavailable online at httpwwwbiomedcentralcombmcbioinformaticssupplements13S5

Author details1Electrical Engineering and Computer Science Department University ofCentral Florida Orlando FL 32816 USA 2Computer Science DepartmentCarnegie Mellon University Pittsburgh PA 15213 USA 3Lane Center forComputational Biology Carnegie Mellon University Pittsburgh PA 15213USA

Authorsrsquo contributionsSKJ and CJL contributed equally to all parts of the paper

Competing interestsThe authors declare that they have no competing interests

Published 12 April 2012

References1 Faeder JR Blinov ML Goldstein B Hlavacek WS Rule-based modeling of

biochemical networks Complexity 2005 10(4)22-412 Haigh J Stochastic Modelling for Systems Biology by D J Wilkinson

Journal Of The Royal Statistical Society Series A 2007 170261-261[httpideasrepec orgablajorssav170y2007i1p261-261html]

3 Iosifescu M Tautu P Iosifescu M Stochastic processes and applications inbiology and medicine [by] M Iosifescu [and] P Tautu Editura AcademieiSpringer-Verlag Bucuresti New York 1973

4 Twycross J Band L Bennett MJ King J Krasnogor N Stochastic anddeterministic multiscale models for systems biology an auxin-transportcase study BMC Systems Biology 2010 4(34)[httpwww biomedcentralcom17520509434abstract]

5 Kwiatkowska M Norman G Parker D Stochastic Model Checking InFormal Methods for the Design of Computer Communication and SoftwareSystems Performance Evaluation (SFMrsquo07) Volume 4486 SpringerBernardo MHillston J 2007220-270 LNCS (Tutorial Volume)

6 Kwiatkowska M Norman G Parker D Advances and Challenges ofProbabilistic Model Checking Proc 48th Annual Allerton Conference onCommunication Control and Computing IEEE Press 2010

7 Langmead C Generalized Queries and Bayesian Statistical ModelChecking in Dynamic Bayesian Networks Application to PersonalizedMedicine Proc of the 8th International Conference on Computational SystemsBioinformatics (CSB) 2009 201-212

8 Jha SK Clarke EM Langmead CJ Legay A Platzer A Zuliani P A BayesianApproach to Model Checking Biological Systems In CMSB Volume 5688 ofLecture Notes in Computer Science SpringerDegano P Gorrieri R2009218-234

9 Wald A Sequential Analysis New York John Wiley and Son 194710 Lehmann EL Romano JP Testing statistical hypotheses 3 edition Springer

Texts in Statistics New York Springer 200511 Edgar GA Radon-Nikodym theorem Duke Mathematical Journal 1975

42447-45012 Younes HLS Simmons RG Probabilistic Verification of Discrete Event

Systems Using Acceptance Sampling In CAV Volume 2404 of Lecture Notesin Computer Science SpringerBrinksma E Larsen KG 2002223-235

13 Younes HLS Kwiatkowska MZ Norman G Parker D Numerical vsStatistical Probabilistic Model Checking An Empirical Study TACAS 200446-60

14 Lassaigne R Peyronnet S Approximate Verification of ProbabilisticSystems PAPM-PROBMIV 2002 213-214

15 Heacuterault T Lassaigne R Magniette F Peyronnet S Approximate ProbabilisticModel Checking Proc 5th International Conference on Verification ModelChecking and Abstract Interpretation (VMCAIrsquo04) Volume 2937 of LNCSSpringer 2004

16 Sen K Viswanathan M Agha G Statistical Model Checking of Black-BoxProbabilistic Systems CAV 2004 202-215

17 Grosu R Smolka S Monte Carlo Model Checking CAV 2005 271-28618 Gondi K Patel Y Sistla AP Monitoring the Full Range of omega-Regular

Properties of Stochastic Systems Verification Model Checking and AbstractInterpretation 10th International Conference VMCAI 2009 Volume 5403 ofLNCS Springer 2009 105-119

19 Jha SK Langmead C Ramesh S Mohalik S When to stop verificationStatistical Trade-off between Costs and Expected Losses Proceedings ofDesign Automation and test In Europe (DATE) 2011 1309-1314

20 Clarke EM Faeder JR Langmead CJ Harris LA Jha SK Legay A StatisticalModel Checking in BioLab Applications to the Automated Analysis of T-Cell Receptor Signaling Pathway CMSB 2008 231-250

21 Langmead CJ Jha SK Predicting Protein Folding Kinetics Via TemporalLogic Model Checking In WABI Volume 4645 of Lecture Notes in ComputerScience SpringerGiancarlo R Hannenhalli S 2007252-264

22 Jha S Langmead C Synthesis and Infeasibility Analysis for StochasticModels of Biochemical Systems using Statistical Model Checking andAbstraction Refinement

23 Jeffreys H Theory of Probability Oxford University Press 193924 Gelman A Carlin JB Stern HS Rubin DB Bayesian Data Analysis London

Chapman amp Hall 199525 Jha S Langmead C Exploring Behaviors of SDE Models of Biological

Systems using Change of Measures Proc of the 1st IEEE InternationalConference on Computational Advances in Bio and medical Sciences (ICCABS)2011 111-117

26 Oslashksendal B Stochastic Differential Equations An Introduction withApplications (Universitext) 6 edition Springer 2003 [httpwwwamazoncomexecobidosredirecttag=citeulike07-20amppath=ASIN3540047581]

27 Karatzas I Shreve SE Brownian Motion and Stochastic Calculus (GraduateTexts in Mathematics) 2 edition Springer 1991 [httpwwwamazoncomexecobidosredirecttag=citeulike07-20amppath=ASIN0387976558]

28 Wang MC Uhlenbeck GE On the Theory of the Brownian Motion IIReviews of Modern Physics 1945 17(2-3)323[httpdxdoiorg101103RevModPhys 17323]

29 Girsanov IV On Transforming a Certain Class of Stochastic Processes byAbsolutely Continuous Substitution of Measures Theory of Probability andits Applications 1960 5(3)285-301

30 Pnueli A The Temporal Logic of Programs FOCS IEEE 1977 46-5731 Owicki SS Lamport L Proving Liveness Properties of Concurrent

Programs ACM Trans Program Lang Syst 1982 4(3)455-49532 Finkbeiner B Sipma H Checking Finite Traces Using Alternating

Automata Formal Methods in System Design 2004 24(2)101-12733 Harel D Statecharts A Visual Formalism for Complex Systems Science of

Computer Programming 1987 8(3)231-274[httpciteseerxistpsueduviewdocsummarydoi=1011204799]

34 Ehrig H Orejas F Padberg J Relevance Integration and Classification ofSpecification Formalisms and Formal Specification Techniques[httpciteseerxist psueduviewdocsummarydoi=1011427137]

35 Berger J Statistical Decision Theory and Bayesian Analysis Springer-Verlag1985

36 Jeffreys H Theory of probabilityby Harold Jeffreys 3 edition Clarendon PressOxford 1961

37 Choi T Ramamoorthi RV Remarks on consistency of posteriordistributions ArXiv e-prints 2008

38 Lefever R RGaray S Biomathematics and Cell Kinetics Elsevier North-Hollanbiomedical Press 1978 chap Local description of immune tumorrejection333

39 Horhat R Horhat R Opris D The simulation of a stochastic model fortumour-immune system Proceedings of the 2nd WSEAS internationalconference on Biomedical electronics and biomedical informatics BEBIrsquo09Stevens Point Wisconsin USA World Scientific and Engineering Academyand Society (WSEAS) 2009 247-252[httpportalacmorgcitationcfmid =19465391946584]

40 Kuznetsov V Makalkin I Taylor M Perelson A Nonlinear dynamics ofimmunogenic tumors Parameter estimation and global bifurcationanalysis Bulletin of Mathematical Biology 1994 56295-321[httpdx doiorg101007BF02460644] 101007BF02460644

doi1011861471-2105-13-S5-S8Cite this article as Jha and Langmead Exploring behaviors of stochasticdifferential equation models of biological systems using change ofmeasures BMC Bioinformatics 2012 13(Suppl 5)S8

Submit your next manuscript to BioMed Centraland take full advantage of

bull Convenient online submission

bull Thorough peer review

bull No space constraints or color figure charges

bull Immediate publication on acceptance

bull Inclusion in PubMed CAS Scopus and Google Scholar

bull Research which is freely available for redistribution

Submit your manuscript at wwwbiomedcentralcomsubmit

Abstract

Background

Related work

Methods

Stochastic differential equation models

Girsanovrsquos theorem for perturbing stochastic differential equation models

Specifying dynamic behaviors

Semantics of bounded linear temporal logic (BLTL)

Statistical model validation

Sequential hypothesis testing

Bayesian sequential hypothesis testing

Non-iid Bayesian sequential hypothesis testing

Bounding errors under a change of measure

Termination conditions for non-iid sampling

Algorithm

Results and discussion

Lefever and Garay model

Nonlinear immunogenic tumor model

Discussion

Conclusions

Acknowledgements

Author details

Authors contributions

Competing interests

References

in contrast can be used to increase the probability ofobserving rare events at the expense of distorting theunderlying probability distributionmeasure If thechange of measure is well-characterized (as in impor-tance sampling) these distortions can be correctedwhen computing properties of the distributionOur method uses a combination of biased sampling

and sequential hypothesis testing [910] to estimate theprobability that the model satisfies the property Brieflythe algorithm randomly perturbs the computationalmodel prior to generating each sample in order to exposerare but interesting behaviors These perturbations causea change of measure We note that in the context ofSDEs a change of measure is itself a stochastic processand so importance sampling which assumes that thechange of measure with respect to the biased distributionis known cannot be used Our technique does notrequire us to know the exact magnitudes of the changesof measure nor the Radon-Nikodym derivatives [11]Instead it ensures that that the geometric average ofthese derivatives is bounded This is a much weakerassumption than is required by importance sampling butwe will show that it is sufficient for the purposes ofobtaining bounds via sequential hypothesis testing

Related workOur method performs statistical model checking usinghypothesis testing [1213] which has been used pre-viously to analyze in a variety of domains (eg[14-1919]) including computational biology (eg

[7820-22]) There are two key differences between themethod in this paper and existing methods First exist-ing methods generate independently and identically dis-tributed (iid) samples Our method in contrastgenerates independent but non-identically distributed(non-iid) samples It does so to expose rare behaviorsSecond whereas most existing methods for statisticalmodel checking use classical statistics our methodemploys Bayesian statistics [2324] We have previouslyshown that Bayesian statistics confers advantages in thecontext of statistical verification in terms of efficiencyand flexibility [781925]

MethodsOur method draws on concepts from several differentfields We begin by briefly surveying the semantics ofstochastic differential equations a language for formallyspecifying dynamic behaviors Girsanovrsquos theorem onchange of measures and results on consistency and con-centration of Bayesian posteriors

Stochastic differential equation modelsA stochastic differential equation (SDE) [2627] is a dif-ferential equation in which some of the terms evolveaccording to Brownian Motion [28] A typical SDE is ofthe following form

dX = b(t Xt)dt + v(t Xt)dWt

where X is a system variable b is a Riemann integr-able function v is an Itō integrable function and W is

Figure 1 Observing rare behaviors in iid sampling is challenging A toy model with a one low-probability state An unbiased samplingalgorithm may require billions of samples in order to observe the lsquobadrsquo state Statistical algorithms based on iid algorithms are not suitable foranalyzing such models with rare interesting behaviors

Zt = exp(minusint t

0θudWuminus

u du2)

int 1θ

P(ρ lt θ0|Xi) =

int θ0

P (ρ lt θ0|Xi) =

int θ0

pμi (Xi|u)g(u) du

int 10

pμi (Xi|u)g(u) du

P (ρ lt θ0|X) =

int θ0

nprodi=1

pμi (Xi|u) g(u) du

int 10

nprodi=1

pμi (Xi|u) g(u) du

Q (ρ lt θ0|Xi) =

int θ0

Q (ρ lt θ0|Xi) =

int θ0

pμ(Xi|u) g(u) du

int 10

pμ(Xi|u) g(u) du

= c2P(ρ lt θ0|Xi)

Furthermore

P (ρ lt θ0|Xi)

Q (ρ lt θ0|X) =

int θ0

nprodi=1

pμ(Xi|u) g(u) du

int 10

nprodi=1

pμ(Xi|u) g(u) du

int θ0

0 cnnprod

int 10

nprodi=1

(pμ(Xi|u)) g(u) du

Similarly

int θ0

nprodi=1

(pμ(Xi|u)) g(u) du

int 10

nprodi=1

(pμ(Xi|u)) g(u) du

1ηle n

= r0x (1minus xK

)minus βx2

1 + x2+ x(1minus x

K) A0cos (ωt)

+ x (1minus xK

Prge001(F10(xgt 1011))

Prge001(F10(x gt 33))

Abstract

Background

Related work

Methods











Algorithm




Discussion

Conclusions

Acknowledgements

Author details


Competing interests

References

Zt = exp(minusint t

0θudWuminus

u du2)

int 1θ

P(ρ lt θ0|Xi) =

int θ0

P (ρ lt θ0|Xi) =

int θ0

pμi (Xi|u)g(u) du

int 10

pμi (Xi|u)g(u) du

P (ρ lt θ0|X) =

int θ0

nprodi=1

pμi (Xi|u) g(u) du

int 10

nprodi=1

pμi (Xi|u) g(u) du

Q (ρ lt θ0|Xi) =

int θ0

Q (ρ lt θ0|Xi) =

int θ0

pμ(Xi|u) g(u) du

int 10

pμ(Xi|u) g(u) du

= c2P(ρ lt θ0|Xi)

Furthermore

P (ρ lt θ0|Xi)

Q (ρ lt θ0|X) =

int θ0

nprodi=1

pμ(Xi|u) g(u) du

int 10

nprodi=1

pμ(Xi|u) g(u) du

int θ0

0 cnnprod

int 10

nprodi=1

(pμ(Xi|u)) g(u) du

Similarly

int θ0

nprodi=1

(pμ(Xi|u)) g(u) du

int 10

nprodi=1

(pμ(Xi|u)) g(u) du

1ηle n

= r0x (1minus xK

)minus βx2

1 + x2+ x(1minus x

K) A0cos (ωt)

+ x (1minus xK

Prge001(F10(xgt 1011))

Prge001(F10(x gt 33))

Abstract

Background

Related work

Methods











Algorithm




Discussion

Conclusions

Acknowledgements

Author details


Competing interests

References

int 1θ

P(ρ lt θ0|Xi) =

int θ0

P (ρ lt θ0|Xi) =

int θ0

pμi (Xi|u)g(u) du

int 10

pμi (Xi|u)g(u) du

P (ρ lt θ0|X) =

int θ0

nprodi=1

pμi (Xi|u) g(u) du

int 10

nprodi=1

pμi (Xi|u) g(u) du

Q (ρ lt θ0|Xi) =

int θ0

Q (ρ lt θ0|Xi) =

int θ0

pμ(Xi|u) g(u) du

int 10

pμ(Xi|u) g(u) du

= c2P(ρ lt θ0|Xi)

Furthermore

P (ρ lt θ0|Xi)

Q (ρ lt θ0|X) =

int θ0

nprodi=1

pμ(Xi|u) g(u) du

int 10

nprodi=1

pμ(Xi|u) g(u) du

int θ0

0 cnnprod

int 10

nprodi=1

(pμ(Xi|u)) g(u) du

Similarly

int θ0

nprodi=1

(pμ(Xi|u)) g(u) du

int 10

nprodi=1

(pμ(Xi|u)) g(u) du

1ηle n

= r0x (1minus xK

)minus βx2

1 + x2+ x(1minus x

K) A0cos (ωt)

+ x (1minus xK

Prge001(F10(xgt 1011))

Prge001(F10(x gt 33))

Abstract

Background

Related work

Methods











Algorithm




Discussion

Conclusions

Acknowledgements

Author details


Competing interests

References

P(ρ lt θ0|Xi) =

int θ0

P (ρ lt θ0|Xi) =

int θ0

pμi (Xi|u)g(u) du

int 10

pμi (Xi|u)g(u) du

P (ρ lt θ0|X) =

int θ0

nprodi=1

pμi (Xi|u) g(u) du

int 10

nprodi=1

pμi (Xi|u) g(u) du

Q (ρ lt θ0|Xi) =

int θ0

Q (ρ lt θ0|Xi) =

int θ0

pμ(Xi|u) g(u) du

int 10

pμ(Xi|u) g(u) du

= c2P(ρ lt θ0|Xi)

Furthermore

P (ρ lt θ0|Xi)

Q (ρ lt θ0|X) =

int θ0

nprodi=1

pμ(Xi|u) g(u) du

int 10

nprodi=1

pμ(Xi|u) g(u) du

int θ0

0 cnnprod

int 10

nprodi=1

(pμ(Xi|u)) g(u) du

Similarly

int θ0

nprodi=1

(pμ(Xi|u)) g(u) du

int 10

nprodi=1

(pμ(Xi|u)) g(u) du

1ηle n

= r0x (1minus xK

)minus βx2

1 + x2+ x(1minus x

K) A0cos (ωt)

+ x (1minus xK

Prge001(F10(xgt 1011))

Prge001(F10(x gt 33))

Abstract

Background

Related work

Methods











Algorithm




Discussion

Conclusions

Acknowledgements

Author details


Competing interests

References

1ηle n

= r0x (1minus xK

)minus βx2

1 + x2+ x(1minus x

K) A0cos (ωt)

+ x (1minus xK

Prge001(F10(xgt 1011))

Prge001(F10(x gt 33))

Abstract

Background

Related work

Methods











Algorithm




Discussion

Conclusions

Acknowledgements

Author details


Competing interests

References

Prge001(F10(x gt 33))

Abstract

Background

Related work

Methods











Algorithm




Discussion

Conclusions

Acknowledgements

Author details


Competing interests

References

Prge001(F10(x gt 33))

Abstract

Background

Related work

Methods











Algorithm




Discussion

Conclusions

Acknowledgements

Author details


Competing interests

References

Abstract

Background

Related work

Methods











Algorithm




Discussion

Conclusions

Acknowledgements

Author details


Competing interests

References

Abstract

Background

Related work

Methods











Algorithm




Discussion

Conclusions

Acknowledgements

Author details


Competing interests

References

Documents

Exploring behaviors of stochastic differential equation models of biological systems using change of measures