12
Hindawi Publishing Corporation Advances in Bioinformatics Volume 2013, Article ID 205763, 11 pages http://dx.doi.org/10.1155/2013/205763 Research Article Reverse Engineering Sparse Gene Regulatory Networks Using Cubature Kalman Filter and Compressed Sensing Amina Noor, 1 Erchin Serpedin, 1 Mohamed Nounou, 2 and Hazem Nounou 3 1 Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843-3128, USA 2 Chemical Engineering Department, Texas A&M University at Qatar, 253 Texas A&M Engineering Building, Education City, P.O. Box 23874, Doha, Qatar 3 Electrical Engineering Department, Texas A&M University at Qatar, 253 Texas A&M Engineering Building, Education City, P.O. Box 23874, Doha, Qatar Correspondence should be addressed to Amina Noor; [email protected] Received 30 November 2012; Accepted 15 April 2013 Academic Editor: Yufei Huang Copyright © 2013 Amina Noor et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. is paper proposes a novel algorithm for inferring gene regulatory networks which makes use of cubature Kalman filter (CKF) and Kalman filter (KF) techniques in conjunction with compressed sensing methods. e gene network is described using a state-space model. A nonlinear model for the evolution of gene expression is considered, while the gene expression data is assumed to follow a linear Gaussian model. e hidden states are estimated using CKF. e system parameters are modeled as a Gauss-Markov process and are estimated using compressed sensing-based KF. ese parameters provide insight into the regulatory relations among the genes. e Cram´ er-Rao lower bound of the parameter estimates is calculated for the system model and used as a benchmark to assess the estimation accuracy. e proposed algorithm is evaluated rigorously using synthetic data in different scenarios which include different number of genes and varying number of sample points. In addition, the algorithm is tested on the DREAM4 in silico data sets as well as the in vivo data sets from IRMA network. e proposed algorithm shows superior performance in terms of accuracy, robustness, and scalability. 1. Introduction Gene regulation is one of the most intriguing processes taking place in living cells. With hundreds of thousands of genes at their disposal, cells must decide which genes are to express at a particular time. As the cell development evolves, different needs and functions entail an efficient mechanism to turn the required genes on while leaving the others off. Cells can also activate new genes to respond effectively to environmental changes and perform specific roles. e knowledge of which gene triggers a particular genetic condition can help us ward off the potential harmful effects by switching that gene off. For instance, cancer can be controlled by deactivating the genes that cause it. Gene expression is the process of generating functional gene products, for example, mRNA and protein. e level of gene functionality can be measured using microarrays or gene chips to produce the gene expression data [1]. More accurate estimation of gene expression is now possible using the RNA-Seq method. Intelligent use of such data can help improve our understanding of how the genes are interacting in a living organism [24]. Gene regulation is known to exhibit several modes; a couple of important ones include transcription regulation and posttranscription regulation [5]. While the theoretical applications of gene regulation are extremely promising, it requires a thorough understanding of this complex process. Different genes may cooperate to pro- duce a particular reaction, while a gene may repress another gene as well. e potential benefits of gene regulation can only be reaped if a complete and accurate picture of genetic interactions is available. A network specifying different inter- connections of genes can go a long way in understanding the gene regulation mechanism. e control and interaction of genes can be described through a gene regulatory network. Such a network depicts various interdependencies among genes where nodes of the network represent the genes,

Research Article Reverse Engineering Sparse Gene ...downloads.hindawi.com/journals/abi/2013/205763.pdf · compressed sensing-based Kalman lter is used for the estimation of system

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Research Article Reverse Engineering Sparse Gene ...downloads.hindawi.com/journals/abi/2013/205763.pdf · compressed sensing-based Kalman lter is used for the estimation of system

Hindawi Publishing CorporationAdvances in BioinformaticsVolume 2013 Article ID 205763 11 pageshttpdxdoiorg1011552013205763

Research ArticleReverse Engineering Sparse Gene Regulatory Networks UsingCubature Kalman Filter and Compressed Sensing

Amina Noor1 Erchin Serpedin1 Mohamed Nounou2 and Hazem Nounou3

1 Department of Electrical and Computer Engineering Texas AampM University College Station TX 77843-3128 USA2Chemical Engineering Department Texas AampM University at Qatar 253 Texas AampM Engineering BuildingEducation City PO Box 23874 Doha Qatar

3 Electrical Engineering Department Texas AampM University at Qatar 253 Texas AampM Engineering BuildingEducation City PO Box 23874 Doha Qatar

Correspondence should be addressed to Amina Noor aminaneotamuedu

Received 30 November 2012 Accepted 15 April 2013

Academic Editor Yufei Huang

Copyright copy 2013 Amina Noor et al This is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

This paper proposes a novel algorithm for inferring gene regulatory networks whichmakes use of cubature Kalman filter (CKF) andKalman filter (KF) techniques in conjunction with compressed sensing methodsThe gene network is described using a state-spacemodel A nonlinear model for the evolution of gene expression is considered while the gene expression data is assumed to follow alinear Gaussian modelThe hidden states are estimated using CKFThe system parameters are modeled as a Gauss-Markov processand are estimated using compressed sensing-based KF These parameters provide insight into the regulatory relations among thegenes The Cramer-Rao lower bound of the parameter estimates is calculated for the system model and used as a benchmark toassess the estimation accuracy The proposed algorithm is evaluated rigorously using synthetic data in different scenarios whichinclude different number of genes and varying number of sample points In addition the algorithm is tested on the DREAM4 insilico data sets as well as the in vivo data sets from IRMA network The proposed algorithm shows superior performance in termsof accuracy robustness and scalability

1 Introduction

Gene regulation is one of themost intriguing processes takingplace in living cells With hundreds of thousands of genes attheir disposal cells must decide which genes are to express ata particular time As the cell development evolves differentneeds and functions entail an efficient mechanism to turn therequired genes on while leaving the others off Cells can alsoactivate new genes to respond effectively to environmentalchanges and perform specific roles The knowledge of whichgene triggers a particular genetic condition can help us wardoff the potential harmful effects by switching that gene off Forinstance cancer can be controlled by deactivating the genesthat cause it

Gene expression is the process of generating functionalgene products for example mRNA and protein The levelof gene functionality can be measured using microarrays orgene chips to produce the gene expression data [1] More

accurate estimation of gene expression is now possible usingthe RNA-Seq method Intelligent use of such data can helpimprove our understanding of how the genes are interactingin a living organism [2ndash4] Gene regulation is known toexhibit several modes a couple of important ones includetranscription regulation and posttranscription regulation [5]While the theoretical applications of gene regulation areextremely promising it requires a thorough understanding ofthis complex process Different genes may cooperate to pro-duce a particular reaction while a gene may repress anothergene as well The potential benefits of gene regulation canonly be reaped if a complete and accurate picture of geneticinteractions is available A network specifying different inter-connections of genes can go a long way in understanding thegene regulation mechanism The control and interaction ofgenes can be described through a gene regulatory networkSuch a network depicts various interdependencies amonggenes where nodes of the network represent the genes

2 Advances in Bioinformatics

and the edges between them correspond to an interactionamong them The strength of these interactions representsthe extent to which a gene is affected by other genes in thenetwork A key ingredient of this approach is an accurate andrepresentative modeling of gene networks Precise modelingof a regulatory network coupled with efficient inference andintervention algorithms can help in devising personalizedmedicines and cures for genetic diseases [6]

Various methods for gene network modeling have beenproposed recently in the literature with varying degrees ofsophistication [7ndash10]These techniques can be broadly classi-fied as static and dynamicmodeling schemes Staticmodelingincludes the use of correlation statistical independence forclustering [11ndash13] and information theoretic criteria [14ndash16]On the other hand dynamic models provide an insight intothe temporal evolution of gene expressions and hence yielda more quantitative prediction on gene network behavior[17ndash20] In order to incorporate the stochasticity of geneexpressions statistical techniques have been applied [13] Arich literature is also available on the Bayesian modeling ofgene networks [21ndash26] Promoted in part by the Bayesianmethods the state-space approach is a popular techniqueto model the gene networks [27ndash33] whereby the hiddenstates can be estimated using the Kalman filter In the caseof nonlinear functions the extended Kalman filter (EKF) andparticle filter represent feasible approaches [33 34] Howeverthe EKF relies on the first-order linear approximations ofnonlinearities while the particle filter may be computation-ally too complex A comprehensive review of these methodscan be found in [35]

In this paper the gene network is modeled using a state-space approach and the cubature Kalman filter (CKF) is usedto estimate the hidden states of the nonlinear model [3637] The gene expressions are assumed to evolve following asigmoid squash function whereas a linear function is con-sidered for the expression data The noise is assumed to beGaussian for both the state evolution and gene expressionmeasurements As the gene network is assumed sparse anysimple mean square error minimization technique will notsuffice for the estimation of static parameters Therefore acompressed sensing-based Kalman filter (CSKF) [38] is usedin conjunction with CKF for reliable estimation of parame-ters In case of statistical inference it is essential to obtainsome guarantees on the performance of estimators In thisregard the Cramer-Rao lower bound (CRB) of the parameterestimates is used as a benchmarking index to assess the meansquare error (MSE) performance of the proposed estimatorwhich is evaluated here for a parameter vector The perfor-mance of the proposed algorithm is tested on syntheticallygenerated random Boolean networks in various scenariosThe algorithm is also tested using DREAM4 data sets andIRMA networks [39 40]

The main contributions of this paper can be summarizedas follows

(1) CKF is proposed for the estimation of states and acompressed sensing-based Kalman filter is used forthe estimation of system parameters The genes are

known to interact with few other genes only necessi-tating the use of sparsity constraint for more accurateestimationTheproposed algorithmcarries out onlineestimation of parameters and is therefore computa-tionally efficient and is particularly suitable for largegene networks

(2) The Cramer-Rao lower bound is calculated for theestimation of unknown parameters of the systemTheperformance of the proposed algorithm is comparedto CRB This comparison is significant as it showsroom for improvement in the estimation of param-eters

(3) The proposed algorithm is compared with the EKFalgorithm Using the false alarm errors true connec-tions and Hamming distance as fidelity criteria rig-orous simulations are carried out to assess the perfor-mance of the algorithm with the increase in the num-ber of samples In addition receiver operating charac-teristic (ROC) curves are plotted to evaluate the algo-rithms for different network sizes It is observed thatthe proposed algorithm outperforms EKF in termsof accuracy and precision The proposed algorithmis then applied to the DREAM4 10-gene and 100-gene data sets to assess the algorithm accuracy Theunderlying gene network for the IRMA data sets isalso inferred

The rest of this paper is organized as follows Section 2describes the underlying system model for the gene expres-sions The proposed CKF algorithm in combination withCSKF for gene network inference is formulated in Section 3The derivation of CRB is shown in Section 4 and thesimulation results and their interpretation are presented inSection 5 Finally conclusions are drawn in Section 6

2 System Model

Gene regulatory networks can bemodeled as static or dynam-ical systems In this work state-space modeling is consideredwhich is an instance of a dynamic modeling approach andcan effectively cope with time variationsThe states representgene expressions and their evolution in time in general canbe expressed as

x119896 = 119892 (x119896minus1) + w119896 119896 = 1 119870 (1)

where 119870 is the total number of data points available w119896 isassumed to be a zero-mean Gaussian random variable withcovariance Q119896 = 120590

2

119908I and the function 119892(sdot) represents the

regulatory relationship between the genes and is generallynonlinear The microarray data is a set of noisy observationsand is commonly expressed as a linear Gaussian model [41]

y119896 = ℎ (x119896) + k119896 (2)

where k119896 is Gaussian-distributed random variable with zeromean and covariance S119896 = 120590

2

V I and incorporates the uncer-tainty in the microarray experiments In order to capture

Advances in Bioinformatics 3

the gene interactions effectively the following nonlinear stateevolution model is assumed [33 34]

119909119896119899 =

119873

sum

119898=1

119887119899119898119891 (119909119896minus1119898) + 119908119896119899

119896 = 1 119870 119899 = 1 119873

(3)

where119873 is the total number of genes in the network and 119891(sdot)

is the sigmoid squash function

119891 (119909119896minus1119898) =

1

1 + 119890

minus119909119896minus1119898 (4)

This particular choice for the nonlinear function ensures thatthe conditional distribution of the states remains Gaussian[41] The multiplicative constants 119887119899119898 quantify the positive ornegative relations between various genes in the network Apositive value of 119887119899119898 implies that the 119898th gene is activatingthe 119899th gene whereas a negative value implies repression[34 42] The absolute value of these parameters indicates thestrength of interaction

The model given in (3) and (4) in the absence of any con-straints may be unidentifiable and may result into overfittedsolutions [43] Assumptions on network structures are there-fore necessary to obtain a connectivity matrix that agreeswith the biological knowledge In a gene regulatory network(GRN) the genes are known to interact with few other genesonly To this end the coefficients 119887119899119898s are estimated usingsparsity constraints as explained in the next section

A discrete linear Gaussian model for the microarray datais considered which can be expressed at the 119896th time instantas [41]

y119896 = x119896 + v119896 (5)

Stacking the unknown parameters together the parametervector to be estimated is

b ≜ [1206011 1206012 120601119873] (6)

where 120601119899 = [1198871198991 119887119899119873] Plugging the values of states from(3) into (5) it follows that

y119896 = R119896b + e119896 (7)

where

R119896 ≜[

[

[

[

f119896 0 0 00 f119896 0 00 0 d 00 0 0 f119896

]

]

]

]

(8)

f119896 ≜ [119891 (119909119896minus11) sdot sdot sdot 119891 (119909119896minus1119873)] (9)

Thus the gene network inference problem boils down to theestimation of system parameters b using the observationsy119896 where the effective noise e119896 is the sum of system andobservation noises The next section describes the proposedinference algorithm for sparse networks

CKF CSKF

119896 = 119870

Output

No

Initialize

Yes

y119896

x119896

b119896b0 x0

Input time-series data y

Figure 1 Block diagram of network inference methodology CKFS

3 Method

In this section the methodology proposed to infer the sys-tem parameters in (3) is described The proposed cubatureKalman filter with sparsity constraints (CKFS) approach issuccinctly illustrated in Figure 1 The specific details of thisalgorithm are as next presented

31 Cubature Kalman Filter Kalman filter is a Bayesian filterwhich provides the optimal solution to a general linear statespace inference problemdepicted by (1) and (2) and assumes arecursive predictive-update process The underlying assump-tion of Gaussianity for the predictive and the likelihooddensities simplifies the Kalman filter algorithm to a two-stepprocess consisting of prediction and update of the mean andcovariance of the hidden states However the presence ofnonlinear functions in the state and measurement equationsrequires calculation ofmultidimensional integrals of the formnonlinear function times Gaussian density [36] which in generalis computationally prohibitive Several solutions to this prob-lem have been proposed including the EKF which linearizesthe nonlinear function by taking its first-order Taylor approx-imation and the unscented Kalman filter (UKF) whichapproximates the probability density function (PDF) using anonlinear transformation of the random variable Recently anew approach CKF has been proposed which evaluates theintegrals numerically using spherical-radial cubature rules[36]

The next two subsections briefly explain the working ofBayesian filtering and the CKF solution for the nonlinearmultidimensional integrals

311 TimeUpdate Let the observations up to the time instant119896 be denoted by d119896 that is d119896 ≜ [y119879

1 y119879

119896]

119879 In the predic-tion phase also called the time update of the Bayesian filter

4 Advances in Bioinformatics

themean and covariance of theGaussian posterior density arecomputed as follows

x119896|119896minus1 = 119864 [f (x119896minus1) | d119896minus1]

P119909119909119896|119896minus1 = 119864 [f (x119896) f119879(119909119896)] minus x119896|119896minus1x

119879

119896|119896minus1+Q119896minus1

(10)

where 119864 denotes the expectation operator and x119896minus1 is nor-mally distributed with parameters (x119896minus1|119896minus1P119909119909119896minus1|119896minus1) Thethird equality is a consequence of the zero-mean natureof Gaussian noise w and its independence from d119896 Theestimates x119896minus1|119896minus1 and P119909119909119896minus1|119896minus1 are assumed to be availablefrom the previous iteration Here P119909119909119896|119896minus1 is an estimate ofthe error covariance matrix

312 Measurement Update Since the measurement noiseis also Gaussian the likelihood density is given by y119896minus1 |

d119896minus1 N(z119896minus1 y119896|119896minus1P119909119909119896|119896minus1) As themeasurements becomeavailable at the 119896th time instant the mean and covariance ofthe likelihood density are calculated as follows

y119896|119896minus1 = 119864 [y119896 | d119896minus1]

P119910119910119896|119896minus1 = 119864 [x119896x119879

119896] minus y119896|119896minus1y

119879

119896|119896minus1+ S119896minus1

(11)

The updated posterior density obtained from the condi-tional joint density of states and the measurements can beexpressed as

([x119879119896y119879119896]

119879

dkminus1)

sim N((

x119896|119896minus1y119896|119896minus1

) (

P119909119909119896|119896minus1 P119909119910119896|119896minus1P119879119909119910119896|119896minus1

P119910119910119896|119896minus1))

(12)

where

119875119909119910119896|119896minus1 = 119864 [x119896x119879

119896] minus x119896|119896minus1y

119879

119896|119896minus1(13)

is the cross-covariance matrix between the states and themeasurements Hence the states and the corresponding errorcovariance matrix are updated by calculating the innovationz119896 minus z119896|119896minus1 and the Kalman gain K119866119894

x119896|119896 = x119896|119896minus1 + K119866119896 (y119896 minus y119896|119896minus1)

119875119909119909119896|119896 = 119875119909119909119896|119896minus1 minus K119866119896119875119910119910119896|119896minus1K119879

119866119896

K119866119896 = 119875119909119910119896|119896minus1119875minus1

119910119910119896|119896minus1

(14)

The next subsection briefly describes the computation ofhigh-dimensional integrals present in the equations above

313 Computation of Integrals Using Spherical-Radial Cuba-ture Points In order to determine the expectations in (10)using a numerical integration method a spherical-radialcubature rule is applied This method calculates the cubaturepoints X119895119896minus1|119896minus1 as follows [36]

X119895119896minus1|119896minus1 = U119896minus1|119896minus1120577119895 + x119896minus1|119896minus1 (15)

where 120577119895 =radicℓ2[1]119895 119895 = 1 ℓ ℓ = 2119873 denotes the total

number of cubature points andU119896minus1|119896minus1 stands for the squareroot of the error covariance matrix that is

P119909119909119896minus1|119896minus1 = U119896minus1|119896minus1U119879

119896minus1|119896minus1 (16)

The cubature points are updated via the state equation

Xlowast119895119896|119896minus1

= 119892 (X119895119896minus1|119896minus1) (17)

The propagated cubature points yield the state and errorcovariance estimates

x119896|119896minus1 =1

sum

119895=1

Xlowast119895119896|119896minus1

P119909119909119896|119896minus1 =1

sum

119895=1

Xlowast119895119896|119896minus1

Xlowast119879119895119896|119896minus1

minus x119896|119896minus1x119879

119896|119896minus1+Q119896minus1

(18)

The integrals in (11) and (14) can be evaluated in a similarmanner The next subsection explains the estimation ofparameters in the system

32 Estimation of Sparse Parameters Using Kalman Filter Thestate estimates are obtained using the CKF as described inthe previous subsection In order to estimate the unknownparameters in the system model one of the most commonlyused methods involves stacking the parameters with thestates and estimating them together The estimation processperformed in this manner is called joint estimation Anothermethod for the estimation of parameters consists of a two-step recursive process which is termed dual estimation Thisprocess estimates the states in the first step and with theassumption that states are known parameters are estimatedin the second step These steps are repeated until the algo-rithm converges to the true values or until the amount ofavailable observations is exhausted This paper makes use ofthe latter technique

The vector b as defined in (6) is assumed to be evolvingas a Gauss-Markov model As discussed previously the statesare assumed to be known at this step The system evolutionequations can therefore be expressed as

b119896 = b119896minus1 + 120578119896minus1

y119896 = R119896b119896 + e119896(19)

where 120578119896stands for the iid Gaussian noise and R119896 is as

defined in (8) It is observed that (19) is a system of linearequations with additive Gaussian noise and therefore theKalman filter is the optimal choice for the estimation of

Advances in Bioinformatics 5

parameter vector The standard predict and update stepsinvolved in Kalman filter are summarized as follows

b119896|119896minus1 = b119896minus1|119896minus1 + 120578119896

P119887119887119896|119896minus1 = P119887119887119896minus1|119896minus1 + Σ120578119896

u119896 = y119896 minus R119891119896b119896

K119866 = P119887119887119896|119896minus1R119879

119891119896(R119891119896P119887119887119896|119896minus1R

119879

119891119896+ 120590

2

119890Iminus1)

b119896|119896 = b119896|119896minus1 + K119866u119896

P119887119887119896|119896 = (I minus K119866R119891119896)P119887119887119896|119896minus1

(20)

whereK119866 denotes the Kalman gain andP represents the errorcovariance matrix

The Kalman filter algorithm is based on an 1198972-normminimization criterion As the gene networks are known tobe highly sparse the parameter vector is expected to haveonly a few nonzero values A more accurate approach forestimating such a vector would be to introduce an additionalconstraint on its 1198971-normwhich is the core idea in compressedsensing [38 44] Such an 1198971-norm constraint provides aunique solution to the underdetermined set of equations [45]Therefore instead of a simple 1198972 norm minimization thefollowing constrained optimization problem is considered

minb119896

10038171003817100381710038171003817

b119896 minus b11989610038171003817100381710038171003817

2

2st 1003817

1003817100381710038171003817

b11989610038171003817100381710038171003817

le 120598 (21)

The importance of this constraint can be judged by the factthat without it the system would be rendered unidentifiable[43]

The problem (21) can be solved using a pseudomeasure-ment (PM) method which incorporates the inequality con-straint (21) in the filtering process by assuming an artificialmeasurement b1198961 minus 120598 = 0 This is concisely expressed as

0 = Rb119896 minus 120598 R120591 = [sign (b120591 (1)) sign (

b120591 (119873))]

(22)

The value of the covariance matrix Σ120598 = 120590

2

120598I of the pseudo-

noise 120598 is selected in a similar manner as the process noisecovariance in the EKF algorithm However it is found thatlarge values of variances that is 1205902

120598ge 100 prove sufficient in

most cases [38] Further details on selecting these parameterscan be found in [38 46] The PM method solves (21) in arecursive manner for119870120591 iterations using the following steps

K120591119866= P120591R

119879

120591(R120591P120591R

119879

120591+ Σ120598)

minus1

b120591+1 = (I minus K120591119866R120591) b120591

P120591+1 = (I minus K120591119866R120591)P120591

(23)

At each 119896th time instant P119887119887119896|119896 and b119896|119896 obtained from(20) are considered as initial values that is b1 =

b119896|119896 andP1 = P119887119887119896|119896 which is the error covariancematrixThe value of

(1) Input time series data set y(2) Initialize 119868 119870 1206010 x0(3) for 119896 = 1 119870 do(4) Find the state estimates using CKF following the time

and measurement update steps in Section 3(5) Estimate parameters b119896 from x119896 and y119896 using (20)(6) for 120591 = 1 119870120591 do(7) Update the parameters b119896 using (23)(8) end for(9) end for(10) return

Algorithm 1 Network inference CKFS

119870120591 is equal to the number of constraints that is the expectednumber of nonzero b119898119899s in the system Possible ways forcalculating 119870120591 include minimum description length (MDL)principle and Bayesian information criterion (BIC)

33 Inference Algorithm The network inference algorithmis summarized in Algorithm 1 The algorithm consists of arecursive processwhich repeats itself for the number of obser-vations present in the time-series data For each time samplethe state estimate is obtained using the CKF and the parame-ter estimate is obtained using the KF Since the parameters areexpected to be sparse the estimates are then refined furtherusing the CSKF algorithm This iterative process results ina simple and accurate algorithm for gene network inferencewhile considering a complex nonlinear model

4 Crameacuter-Rao Bound

Theperformance of an estimator can be judged by comparingit with theoretical lower bounds proposed in parameterestimation theoryThe CRB establishes a lower bound on theMSE of an unbiased estimator [47] In particular the CRBstates that the covariance matrix of the estimator b is lowerbounded by

E [(b minus b) (b minus b)

119879

] ⪰ [I (b)]minus1 (24)

where the matrix inequality ⪰ is to be interpreted in thesemidefinite sense and I(b) is the Fisher information matrix(FIM)

I (b) = E[(

120597 ln119891 (y | b)120597b

)(

120597 ln119891 (y | b)120597b

)

119879

] (25)

The CRB for gene network inference can be calculated asfollows By stacking all the observations for 119896 = 1 119870 (7)can be written compactly in the matrix form

y = Rb + e (26)

6 Advances in Bioinformatics

where y = [y1198791 y119879

119870]

119879 R = [R1198791 R119879

119870]

119879 and e = [e1198791

e119879119870]

119879 The PDF 119901(y | b) is expressed as

119901 (y | b) = 119862 exp(minus

(y minus Rb)119879 (y minus Rb)2120590

2119890

) (27)

where 119862 is a constant The derivative of ln119901(y | b) can beexpressed as

120597 ln119901 (y | b)120597b

= minus

120597

120597b[

(y minus Rb)119879 (y minus Rb)120590

2119890

]

=

R119879y minus R119879Rb120590

2119890

(28)

It now follows that

(

120597 ln119901 (y | b)120597b

)(

120597 ln119901 (y | b)120597b

)

119879

=

R119879 (y minus Rb) (y minus Rb)119879R120590

4119890

(29)

By taking the expectation of (29) the FIM in (25) is given by

I (b) =

R119879R120590

2119890

(30)

The inverse of the FIM in (30) can be used to place a lowerbound on the estimation error of the parameter vector bFigure 2 shows the comparison of MSE of CKFS algorithmwith CRB as a function of number of samples 119870 for onerepresentative gene from the eight-gene network consideredin Section 51 It is observed that the MSE of the estimatedparameters decreases with increasing number of samples

5 Results and Discussion

The simulation results of the CKFS algorithm are discussed inthis section The performance is first tested on synthetic dataobtained from randomly generated Boolean networks undervarious scenarios and performance metrics The algorithmis then assessed on the DREAM4 networks and the IRMAnetwork

51 Synthetic Data Time-series data is produced from ran-domly generated Boolean networks using the system model(3) and (5) Two scenarios are considered for this purpose

First the comparison is performed by varying the num-ber of sample size while keeping the network size fixed Thegene network consists of 8 genes and 20 vertices In termsof network estimation if the algorithm predicts an edgebetween two nodes which may not be present in reality anerror referred to as false alarm error (F) is said to haveoccurred Another situation is the indication of the absenceof a vertex in the graph which in fact is present in the realnetwork This kind of error is termed missed detection (M)The summation of these two errors normalized over the total

4 6 8 10 12 14 16Sample size

CRBCKFS

10minus7

10minus6

10minus5

10minus4

10minus3

10minus2

10minus1

100

log10

MSE

Figure 2 Cramer-Rao bound on the estimation of parameters TheMSE for one of the representative 120579 is shown here for a networkconsisting of 8 vertices

number of vertices in the network yields the Hammingdistance It is also important to consider the probabilityof predicting the true connections correctly which will beassessed by the true connections (T) metric An algorithmwith low Hamming distance and small false alarm erroris particularly desirable as predicting an edge erroneouslycan be troublesome for biologists True connections indicatethe reliability of the predictions Figure 3 illustrates theperformance of the CKFS algorithm and that of the EKFalgorithm proposed in [34] in terms of the metrics describedabove It is important to mention here that the same systemmodel is assumed by both CKFS and EKF algorithms forthe purpose of this simulation These metrics are the sameas those used in [15] The variances of both the system andmeasurement noises 1205902

119908and 120590

2

V respectively are taken to be10

minus5 in all the simulations and are assumed to be knownIt is noticed that EKF has a slightly lower false alarm ratewhen the number of samples is small however as thenumber of samples increases CKFS yields a lower false alarmerror The Hamming distance for CKFS is also smaller thanEKF indicating lesser cumulative error True connectionsshow a consistent behavior for the two algorithms when thenumber of samples is increased where CKFS is able to predictconnections more accurately These experiments show thesuperiority of CKFS in terms of lower error rate

To obtain a more rigorous evaluation the performance ofalgorithms is then compared in a scenario which considersthe sample size to be fixed and assumes networks of differentsizes The receiver operating characteristic (ROC) curves areplotted as performance measures A higher area under theROC curve (AUROC) shows more true positives for a givenfalse positive and therefore indicates better classificationThe performance of CKFS(119873 119864119870) and EKF(119873 119864119870) isshown in Figure 4 where 119873 stands for the number of nodes

Advances in Bioinformatics 7

5 10 15 20 25 30 35 4002

022

024

026

028

03

032

034

036

038

Sample size

False

alar

m er

rors

EKFCKFS

(a)

035

04

045

05

055

06

065

07

Ham

min

g di

stanc

e

5 10 15 20 25 30 35 40Sample size

EKFCKFS

(b)

5 10 15 20 25 30 35 40045

05

055

06

065

07

075

08

Sample size

True

conn

ectio

ns

EKFCKFS

(c)

Figure 3 (a) (b) and (c) False alarm errors Hamming distance and true connections The synthetic networks consist of 8 vertices and 20edges The metric is normalized over the number of edges CKFS gives lower error and predicts more true connections with the increase inthe sample size of data

119864 represents the number of edges and 119870 denotes the timepoints It is observed that the CKFS exhibits superior perfor-mance than the EKF for networks of different sizes

The complexity of the two algorithms is compared forsynthetically generated networks with number of genes equalto 10 20 30 and 40The sample size is kept to 50 time pointsfor each of these networks and the run time for EKF andCKFS algorithms is calculated as shown in Table 1 It is notedthat EKF is faster for smaller network sizes but as the networksize increases the run time gets much larger than that forCKFS The main reason for this is that EKF [34] estimatesthe states and parameters by stacking them together whichrequires large-sized matrix multiplications at each iteration

The benefit associated with performing dual estimation asin CKFS is that the parameters are estimated separatelyfrom the states Since the system is linear and one-to-onefor parameters inversion of much smaller matrices can beperformed reducing the computational complexity of CKFSalgorithm CKFS is therefore particularly attractive for large-sized networks

52 DREAM4Gene Networks Several in silico networks havebeen produced in order to benchmark the performanceof gene network inference algorithms dialogue on reverseengineering assessment and methods (DREAM) in siliconetworks serve as one of the popular methods used for this

8 Advances in Bioinformatics

0 02 04 06 08 10

01

02

03

04

05

06

07

08

09

1

False positive rate

True

pos

itive

rate

ROC curve

CKFSEKF

(a)

0 02 04 06 08 10

01

02

03

04

05

06

07

08

09

1

False positive rate

True

pos

itive

rate

ROC curve

CKFSEKF

(b)

0 02 04 06 08 10

01

02

03

04

05

06

07

08

09

1

False positive rate

True

pos

itive

rate

ROC curve

CKFSEKF

(c)

Figure 4 ROC curves for the performance of CKFS and EKF using synthetic data (119873119864119870) (a) (b) and (c) (5 10 20) (10 12 20) and(15 19 20) The area under the ROC curve for CKFS is more than that for EKF for various sized networks

Table 1 Run time in seconds for EKF and CKFS algorithms forvarying network sizes for synthetically generated data The numberof sample points is fixed to 50

Number of genes 10 20 30 40EKF 016 19 165 84CKFS 12 43 115 241

purpose [39 48] In this section the performance of theCKFS algorithm is evaluated using the 10-gene and 100-gene networks released online by the DREAM4 challenge

Five networks are produced using the known GRNs ofEscherichia coli and Saccharomyces cerevisiae The data setsfor each of 10-gene network consists of 21 data points forfive different perturbations The inference is performed byusing all the perturbations The 100-gene network consistsof data sets for ten perturbations AUROC and area underthe precision-recall curve (AUPR) are calculated for the fivenetworks of both the data sets and shown in Tables 2 and 3respectively The quantities precision and recall are definedas 119875 = 119879(119879 + 119865) and 119877 = 119879(119879 + 119872) respectively Forcomparison purposes the values of the two quantities fortime-series network identification (TSNI) algorithm that

Advances in Bioinformatics 9

GAL80 GAL4

SW15 CBF1

ASH1

(a)

GAL80 GAL4

SW15 CBF1

ASH1

(b)

GAL80 GAL4

SW15 CBF1

ASH1

(c)

Figure 5 The inferred IRMA networks (a) (b) and (c) Gold standard inferred network using CKFS and inferred network using ODE[39 40] Black arrows indicate true connections blue arrows indicate the edges that are correct but their directions are reversed and redarrows indicate false positives

Table 2 Area under the ROC curve (AUROC) and area under the PR curve (AUPR) for DREAM4 10-gene networks for the five differentnetworks

Algorithm Network 1 Network 2 Network 3 Network 4 Network 5ODE [39] 062 (027) 063 (032) 058 (021) 063 (023) 068 (025)CKFS 063 (040) 067 (050) 072 (050) 075 (049) 081 (042)Random [39] 055 (018) 055 (019) 055 (017) 057 (017) 056 (016)

Table 3 Area under the ROC curve (AUROC) and area under the PR curve (AUPR) for DREAM4 100-gene networks for the five differentnetworks

Algorithm Network 1 Network 2 Network 3 Network 4 Network 5ODE [39] 055 (002) 055 (003) 060 (003) 054 (002) 059 (003)CKFS 067 (013) 057 (008) 060 (010) 062 (010) 060 (007)Random [39] 050 (0002) 050 (0002) 050 (0002) 050 (0002) 050 (0002)

exploits ordinary differential equations are also given [39]The CKFS algorithm is found to perform significantly betterthan the TSNI algorithm

53 IRMA Gene Network In addition to synthetic data it isimperative to test the algorithms using real biological dataIn this subsection the performance of the CKFS algorithm isassessed using the in vivo reverse-engineering and modelingassessment (IRMA) network [40] This network consists offive genes Galactose activates the gene expression in thenetwork whereas glucose deactivates it The cells are grownin the presence of galactose and then switched to glucoseto obtain the switch-off data which represents the expressivesamples at 21 time points The switch-on data consists of 16sample points and is obtained by growing the cells in a glucosemedium and then changing to galactose The system andmeasurement noise variances for the CKFS are assumed tobe identical as in the previous simulations Figure 5 shows theinferred network the gold standard and the network inferredusing TSNI It is observed that the CKFS algorithm succeeds

to predict most of the interactions while giving lower falsepositives

6 Conclusions

This paper presents a novel algorithm for inferring generegulatory networks from time-series data Gene regulationis assumed to follow a nonlinear state evolution model Theparameters of the system which indicate the inhibitory orexcitatory relationships between the genes are estimatedusing compressed sensing-based Kalman filtering The spar-sity constraint on the parameters is crucial because the genesare known to interact with few other genes only The use ofCKF and the dual estimation of states and parameters rendersthe algorithm computationally efficient The performance ofCKFS is evaluated for synthetic data for different networksizes as well as varying sample points ROC curves Hammingdistance and true positives are used for comparing theaccuracy of inferred network with EKF It is observed thatCKFS outperforms the EKF algorithm In addition CKFS

10 Advances in Bioinformatics

gives advantages over EKF in terms of smaller run time forlarge networks The Cramer-Rao lower bound is also deter-mined for the parameters of the model and compared withthe MSE performance of the proposed algorithm Assess-ment using DREAM4 10-gene and 100-gene networks andIRMA network data corroborates the superior performanceof CKFS Future research directions include incorporatingthe estimation of model order in the network inferencealgorithm

Acknowledgments

This work was supported by USNational Science Foundation(NSF) Grant 0915444 and QNRF-NPRP Grant 09-874-3-235 The material in this paper was presented in part at theIEEE InternationalWorkshop on Genomic Signal Processingand Statistics (GENSIPS) San Antonio TX USA December2011

References

[1] X Zhou X Wang and E R Dougherty Genomic NetworksStatistical Inference from Microarray Data John Wiley amp SonsNew York NY USA 2006

[2] H Kitano ldquoComputational systems biologyrdquo Nature vol 420pp 206ndash210 2002

[3] X Zhou and S T CWongComputational Systems Bioinformat-ics World Scientific River Edge NJ USA 2008

[4] X Cai and X Wang ldquoStochastic modeling and simulation ofgene networksrdquo IEEE Signal Processing Magazine vol 24 no 1pp 27ndash36 2007

[5] D Yue J Meng M Lu C L P Chen M Guo and Y HuangldquoUnderstanding micro-RNA regulation a computational per-spectiverdquo IEEE Signal ProcessingMagazine vol 29 no 1 pp 77ndash88 2012

[6] R Pal S Bhattacharya and M U Caglar ldquoRobust approachesfor genetic regulatory network modeling and intervention areview of recent advancesrdquo IEEE Signal Processing Magazinevol 29 no 1 pp 66ndash76 2012

[7] H Hache H Lehrach and R Herwig ldquoReverse engineering ofgene regulatory networks a comparative studyrdquo Eurasip Journalon Bioinformatics and Systems Biology vol 2009 Article ID617281 2009

[8] T Schlitt and A Brazma ldquoCurrent approaches to gene regula-tory networkmodellingrdquo BMC Bioinformatics vol 8 no 6 p 92007

[9] H D Jong ldquoModeling and simulation of genetic regulatoy sys-tems a literature reviewrdquo Journal of Computational Biology vol9 no 1 pp 67ndash103 2002

[10] I Nachman A Regev and N Friedman ldquoInferring quantitativemodels of regulatory networks from expression datardquo Bioinfor-matics vol 20 no 1 pp i248ndashi256 2004

[11] C D Giurcaneanu I Tabus and J Astola ldquoClustering timeseries gene expression data based on sum-of-exponentials fit-tingrdquo EURASIP Journal on Advances in Signal Processing vol2005 no 8 Article ID 358568 pp 1159ndash1173 2005

[12] C D Giurcaneanu I Tabus J Astola J Ollila and M VihinenldquoFast iterative gene clustering based on information theoreticcriteria for selecting the cluster structurerdquo Journal of Computa-tional Biology vol 11 no 4 pp 660ndash682 2004

[13] X Cai andG B Giannakis ldquoIdentifying differentially expressedgenes in microarray experiments with model-based varianceestimationrdquo IEEE Transactions on Signal Processing vol 54 no6 pp 2418ndash2426 2006

[14] X Zhou XWang and E R Dougherty ldquoGene clustering basedon cluster-wide mutual informationrdquo Journal of ComputationalBiology vol 11 no 1 pp 151ndash165 2004

[15] W Zhao E Serpedin and E R Dougherty ldquoInferring connec-tivity of genetic regulatory networks using informationtheoreticcriteriardquo IEEEACMTransactions onComputational Biology andBioinformatics vol 5 no 2 pp 262ndash274 2008

[16] J Dougherty I Tabus and J Astola ldquoInference of gene reg-ulatory networks based on a universal minimum descriptionlengthrdquo Eurasip Journal on Bioinformatics and Systems Biologyvol 2008 Article ID 482090 2008

[17] L Qian H Wang and E R Dougherty ldquoInference of noisynonlinear differential equation models for gene regulatory net-works using genetic programming and Kalman filteringrdquo IEEETransactions on Signal Processing vol 56 no 7 pp 3327ndash33392008

[18] W Zhao E Serpedin and E R Dougherty ldquoInferring gene reg-ulatory networks from time series data using the minimumdescription length principlerdquo Bioinformatics vol 22 no 17 pp2129ndash2135 2006

[19] X Zhou X Wang R Pal I Ivanov M Bittner and E RDougherty ldquoA Bayesian connectivity-based approach to con-structing probabilistic gene regulatory networksrdquo Bioinformat-ics vol 20 no 17 pp 2918ndash2927 2004

[20] J Meng M Lu Y Chen S-J Gao and Y Huang ldquoRobust infer-ence of the context specific structure and temporal dynamics ofgene regulatory networkrdquo BMC Genomics vol 11 no 3 p S112010

[21] Y Zhang Z Deng H Jiang and P Jia ldquoInferring gene reg-ulatory networks from multiple data sources via a dynamicBayesian network with structural emrdquo in DILS S C Boulakiaand V Tannen Eds vol 4544 of Lecture Notes in ComputerScience pp 204ndash214 Springer New York NY USA 2007

[22] K Murphy and S Mian Modeling gene expression data usingdynamic Bayesian networks University of California BerkeleyCalif USA 2001

[23] H Liu D Yue L Zhang Y Chen S J Gao and Y Huang ldquoABayesian approach for identifying miRNA targets by combin-ing sequence prediction and gene expression profilingrdquo BMCGenomics vol 11 no 3 p S12 2010

[24] Y Huang J Wang J Zhang M Sanchez and Y WangldquoBayesian inference of genetic regulatory networks from timeseries microarray data using dynamic Bayesian networksrdquoJournal of Multimedia vol 2 no 3 pp 46ndash56 2007

[25] B-E Perrin L Ralaivola A Mazurie S Bottani J Mallet andF DrsquoAlche-Buc ldquoGene networks inference using dynamicBayesian networksrdquoBioinformatics vol 19 no 2 pp ii138ndashii1482003

[26] C Rangel D LWild F Falciani Z Ghahramani and A GaibaldquoAmodelling biological responses using gene expression profil-ing and linear dynamical systemsrdquo Bioinformatics pp 349ndash3562005

[27] MQuach N Brunel and F drsquoAlch Buc ldquoEstimating parametersand hidden variables in non-linear state-space models based onODEs for biological networks inferencerdquoBioinformatics vol 23no 23 pp 3209ndash3216 2007

[28] F-X Wu W-J Zhang and A J Kusalik ldquoModeling geneexpression from microarray expression data with state-space

Advances in Bioinformatics 11

equationsrdquo in Pacific Symposium on Biocomputing R B Alt-man A K Dunker L Hunter T A Jung and T E Klein Edspp 581ndash592 World Scientific River Edge NJ USA 2004

[29] R Yamaguchi S Yoshida S Imoto T Higuchi and S MiyanoldquoFinding module-based gene networks with state-spacemodelsmining high-dimensional and short time-course geneexpression datardquo IEEE Signal Processing Magazine vol 24 no1 pp 37ndash46 2007

[30] O Hirose R Yoshida S Imoto et al ldquoStatistical inferenceof transcriptional module-based gene networks from timecourse gene expression profiles by using state space modelsrdquoBioinformatics vol 24 no 7 pp 932ndash942 2008

[31] J Angus M Beal J Li C Rangel and D Wild ldquoInferringtranscriptional networks using prior biological knowledge andconstrained state-space modelsrdquo in Learning and Inference inComputational Systems Biology N Lawrence M GirolamiM Rattray and G Sanguinetti Eds pp 117ndash152 MIT PressCambridge UK 2010

[32] C Rangel J Angus Z Ghahramani et al ldquoModeling T-cell acti-vation using gene expression profiling and state-space modelsrdquoBioinformatics vol 20 no 9 pp 1361ndash1372 2004

[33] A Noor E SerpedinMN Nounou andHNNounou ldquoInfer-ring gene regulatory networks via nonlinear state-space modelsand exploiting sparsityrdquo IEEEACM Transactions on Computa-tional Biology and Bioinformatics vol 9 no 4 pp 1203ndash12112012

[34] Z Wang X Liu Y Liu J Liang and V Vinciotti ldquoAn extendedkalman filtering approach tomodeling nonlinear dynamic generegulatory networks via short gene expression time seriesrdquoIEEEACM Transactions on Computational Biology and Bioin-formatics vol 6 no 3 pp 410ndash419 2009

[35] A Noor E Serpedin M Nounou H Nounou N Mohamedand L Chouchane ldquoAn overview of the statistical methodsused for inferring gene regulatory networks and proteinproteininteraction networksrdquo Advances in Bioinformatics vol 2013Article ID 953814 12 pages 2013

[36] I Arasaratnam and S Haykin ldquoCubature kalman filtersrdquo IEEETransactions on Automatic Control vol 54 no 6 pp 1254ndash12692009

[37] A Noor E Serpedin M N Nounou and H N Nounou ldquoAcubature Kalman filter approach for inferring gene regulatorynetworks using time series datardquo in Proceedings of the IEEEInternationalWorkshop onGenomic Signal Processing and Statis-tics (GENSIPS rsquo11) pp 25ndash28 2011

[38] A Carmi P Gurfil and D Kanevsky ldquoMethods for sparsesignal recovery using kalman filtering with embedded rseudo-measurement norms and quasi-normsrdquo IEEE Transactions onSignal Processing vol 58 no 4 pp 2405ndash2409 2010

[39] C A Penfold andD LWild ldquoHow to infer gene networks fromexpression profiles revisitedrdquo Interface Focus pp 857ndash870 2011

[40] I Cantone L Marucci F Iorio et al ldquoA yeast synthetic networkfor in vivo assessment of reverse-engineering and modelingapproachesrdquo Cell vol 137 no 1 pp 172ndash181 2009

[41] Y Huang I M Tienda-Luna and Y Wang ldquoReverse engineer-ing gene regulatory networks a survey of statistical modelsrdquoIEEE Signal Processing Magazine vol 26 no 1 pp 76ndash97 2009

[42] Z Wang F Yang D W C Ho S Swift A Tucker and X LiuldquoStochastic dynamic modeling of short gene expression time-series datardquo IEEE Transactions on Nanobioscience vol 7 no 1pp 44ndash55 2008

[43] H Xiong and Y Choe ldquoStructural systems identification ofgenetic regulatory networksrdquo Bioinformatics vol 24 no 4 pp553ndash560 2008

[44] R Tibshirani ldquoRegression shrinkage and selection via the lassordquoJournal of the Royal Statistical Society B vol 58 no 1 pp 267ndash288 1996

[45] E J Cands andT Tao ldquoDecoding by linear programmingrdquo IEEETransactions on Information Theory vol 51 no 12 pp 4203ndash4215 2005

[46] J D Geeter H V Brussel and J D Schutter ldquoA smoothly con-strained Kalman filterrdquo IEEE Transactions on Pattern Analysisand Machine Intelligence vol 19 no 10 pp 1171ndash1177 1997

[47] S M Kay Fundamentals of Statistical Signal Processing Estima-tion Theory Prentice-Hall New York NY USA 1993

[48] httpwikic2b2columbiaedudream

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 2: Research Article Reverse Engineering Sparse Gene ...downloads.hindawi.com/journals/abi/2013/205763.pdf · compressed sensing-based Kalman lter is used for the estimation of system

2 Advances in Bioinformatics

and the edges between them correspond to an interactionamong them The strength of these interactions representsthe extent to which a gene is affected by other genes in thenetwork A key ingredient of this approach is an accurate andrepresentative modeling of gene networks Precise modelingof a regulatory network coupled with efficient inference andintervention algorithms can help in devising personalizedmedicines and cures for genetic diseases [6]

Various methods for gene network modeling have beenproposed recently in the literature with varying degrees ofsophistication [7ndash10]These techniques can be broadly classi-fied as static and dynamicmodeling schemes Staticmodelingincludes the use of correlation statistical independence forclustering [11ndash13] and information theoretic criteria [14ndash16]On the other hand dynamic models provide an insight intothe temporal evolution of gene expressions and hence yielda more quantitative prediction on gene network behavior[17ndash20] In order to incorporate the stochasticity of geneexpressions statistical techniques have been applied [13] Arich literature is also available on the Bayesian modeling ofgene networks [21ndash26] Promoted in part by the Bayesianmethods the state-space approach is a popular techniqueto model the gene networks [27ndash33] whereby the hiddenstates can be estimated using the Kalman filter In the caseof nonlinear functions the extended Kalman filter (EKF) andparticle filter represent feasible approaches [33 34] Howeverthe EKF relies on the first-order linear approximations ofnonlinearities while the particle filter may be computation-ally too complex A comprehensive review of these methodscan be found in [35]

In this paper the gene network is modeled using a state-space approach and the cubature Kalman filter (CKF) is usedto estimate the hidden states of the nonlinear model [3637] The gene expressions are assumed to evolve following asigmoid squash function whereas a linear function is con-sidered for the expression data The noise is assumed to beGaussian for both the state evolution and gene expressionmeasurements As the gene network is assumed sparse anysimple mean square error minimization technique will notsuffice for the estimation of static parameters Therefore acompressed sensing-based Kalman filter (CSKF) [38] is usedin conjunction with CKF for reliable estimation of parame-ters In case of statistical inference it is essential to obtainsome guarantees on the performance of estimators In thisregard the Cramer-Rao lower bound (CRB) of the parameterestimates is used as a benchmarking index to assess the meansquare error (MSE) performance of the proposed estimatorwhich is evaluated here for a parameter vector The perfor-mance of the proposed algorithm is tested on syntheticallygenerated random Boolean networks in various scenariosThe algorithm is also tested using DREAM4 data sets andIRMA networks [39 40]

The main contributions of this paper can be summarizedas follows

(1) CKF is proposed for the estimation of states and acompressed sensing-based Kalman filter is used forthe estimation of system parameters The genes are

known to interact with few other genes only necessi-tating the use of sparsity constraint for more accurateestimationTheproposed algorithmcarries out onlineestimation of parameters and is therefore computa-tionally efficient and is particularly suitable for largegene networks

(2) The Cramer-Rao lower bound is calculated for theestimation of unknown parameters of the systemTheperformance of the proposed algorithm is comparedto CRB This comparison is significant as it showsroom for improvement in the estimation of param-eters

(3) The proposed algorithm is compared with the EKFalgorithm Using the false alarm errors true connec-tions and Hamming distance as fidelity criteria rig-orous simulations are carried out to assess the perfor-mance of the algorithm with the increase in the num-ber of samples In addition receiver operating charac-teristic (ROC) curves are plotted to evaluate the algo-rithms for different network sizes It is observed thatthe proposed algorithm outperforms EKF in termsof accuracy and precision The proposed algorithmis then applied to the DREAM4 10-gene and 100-gene data sets to assess the algorithm accuracy Theunderlying gene network for the IRMA data sets isalso inferred

The rest of this paper is organized as follows Section 2describes the underlying system model for the gene expres-sions The proposed CKF algorithm in combination withCSKF for gene network inference is formulated in Section 3The derivation of CRB is shown in Section 4 and thesimulation results and their interpretation are presented inSection 5 Finally conclusions are drawn in Section 6

2 System Model

Gene regulatory networks can bemodeled as static or dynam-ical systems In this work state-space modeling is consideredwhich is an instance of a dynamic modeling approach andcan effectively cope with time variationsThe states representgene expressions and their evolution in time in general canbe expressed as

x119896 = 119892 (x119896minus1) + w119896 119896 = 1 119870 (1)

where 119870 is the total number of data points available w119896 isassumed to be a zero-mean Gaussian random variable withcovariance Q119896 = 120590

2

119908I and the function 119892(sdot) represents the

regulatory relationship between the genes and is generallynonlinear The microarray data is a set of noisy observationsand is commonly expressed as a linear Gaussian model [41]

y119896 = ℎ (x119896) + k119896 (2)

where k119896 is Gaussian-distributed random variable with zeromean and covariance S119896 = 120590

2

V I and incorporates the uncer-tainty in the microarray experiments In order to capture

Advances in Bioinformatics 3

the gene interactions effectively the following nonlinear stateevolution model is assumed [33 34]

119909119896119899 =

119873

sum

119898=1

119887119899119898119891 (119909119896minus1119898) + 119908119896119899

119896 = 1 119870 119899 = 1 119873

(3)

where119873 is the total number of genes in the network and 119891(sdot)

is the sigmoid squash function

119891 (119909119896minus1119898) =

1

1 + 119890

minus119909119896minus1119898 (4)

This particular choice for the nonlinear function ensures thatthe conditional distribution of the states remains Gaussian[41] The multiplicative constants 119887119899119898 quantify the positive ornegative relations between various genes in the network Apositive value of 119887119899119898 implies that the 119898th gene is activatingthe 119899th gene whereas a negative value implies repression[34 42] The absolute value of these parameters indicates thestrength of interaction

The model given in (3) and (4) in the absence of any con-straints may be unidentifiable and may result into overfittedsolutions [43] Assumptions on network structures are there-fore necessary to obtain a connectivity matrix that agreeswith the biological knowledge In a gene regulatory network(GRN) the genes are known to interact with few other genesonly To this end the coefficients 119887119899119898s are estimated usingsparsity constraints as explained in the next section

A discrete linear Gaussian model for the microarray datais considered which can be expressed at the 119896th time instantas [41]

y119896 = x119896 + v119896 (5)

Stacking the unknown parameters together the parametervector to be estimated is

b ≜ [1206011 1206012 120601119873] (6)

where 120601119899 = [1198871198991 119887119899119873] Plugging the values of states from(3) into (5) it follows that

y119896 = R119896b + e119896 (7)

where

R119896 ≜[

[

[

[

f119896 0 0 00 f119896 0 00 0 d 00 0 0 f119896

]

]

]

]

(8)

f119896 ≜ [119891 (119909119896minus11) sdot sdot sdot 119891 (119909119896minus1119873)] (9)

Thus the gene network inference problem boils down to theestimation of system parameters b using the observationsy119896 where the effective noise e119896 is the sum of system andobservation noises The next section describes the proposedinference algorithm for sparse networks

CKF CSKF

119896 = 119870

Output

No

Initialize

Yes

y119896

x119896

b119896b0 x0

Input time-series data y

Figure 1 Block diagram of network inference methodology CKFS

3 Method

In this section the methodology proposed to infer the sys-tem parameters in (3) is described The proposed cubatureKalman filter with sparsity constraints (CKFS) approach issuccinctly illustrated in Figure 1 The specific details of thisalgorithm are as next presented

31 Cubature Kalman Filter Kalman filter is a Bayesian filterwhich provides the optimal solution to a general linear statespace inference problemdepicted by (1) and (2) and assumes arecursive predictive-update process The underlying assump-tion of Gaussianity for the predictive and the likelihooddensities simplifies the Kalman filter algorithm to a two-stepprocess consisting of prediction and update of the mean andcovariance of the hidden states However the presence ofnonlinear functions in the state and measurement equationsrequires calculation ofmultidimensional integrals of the formnonlinear function times Gaussian density [36] which in generalis computationally prohibitive Several solutions to this prob-lem have been proposed including the EKF which linearizesthe nonlinear function by taking its first-order Taylor approx-imation and the unscented Kalman filter (UKF) whichapproximates the probability density function (PDF) using anonlinear transformation of the random variable Recently anew approach CKF has been proposed which evaluates theintegrals numerically using spherical-radial cubature rules[36]

The next two subsections briefly explain the working ofBayesian filtering and the CKF solution for the nonlinearmultidimensional integrals

311 TimeUpdate Let the observations up to the time instant119896 be denoted by d119896 that is d119896 ≜ [y119879

1 y119879

119896]

119879 In the predic-tion phase also called the time update of the Bayesian filter

4 Advances in Bioinformatics

themean and covariance of theGaussian posterior density arecomputed as follows

x119896|119896minus1 = 119864 [f (x119896minus1) | d119896minus1]

P119909119909119896|119896minus1 = 119864 [f (x119896) f119879(119909119896)] minus x119896|119896minus1x

119879

119896|119896minus1+Q119896minus1

(10)

where 119864 denotes the expectation operator and x119896minus1 is nor-mally distributed with parameters (x119896minus1|119896minus1P119909119909119896minus1|119896minus1) Thethird equality is a consequence of the zero-mean natureof Gaussian noise w and its independence from d119896 Theestimates x119896minus1|119896minus1 and P119909119909119896minus1|119896minus1 are assumed to be availablefrom the previous iteration Here P119909119909119896|119896minus1 is an estimate ofthe error covariance matrix

312 Measurement Update Since the measurement noiseis also Gaussian the likelihood density is given by y119896minus1 |

d119896minus1 N(z119896minus1 y119896|119896minus1P119909119909119896|119896minus1) As themeasurements becomeavailable at the 119896th time instant the mean and covariance ofthe likelihood density are calculated as follows

y119896|119896minus1 = 119864 [y119896 | d119896minus1]

P119910119910119896|119896minus1 = 119864 [x119896x119879

119896] minus y119896|119896minus1y

119879

119896|119896minus1+ S119896minus1

(11)

The updated posterior density obtained from the condi-tional joint density of states and the measurements can beexpressed as

([x119879119896y119879119896]

119879

dkminus1)

sim N((

x119896|119896minus1y119896|119896minus1

) (

P119909119909119896|119896minus1 P119909119910119896|119896minus1P119879119909119910119896|119896minus1

P119910119910119896|119896minus1))

(12)

where

119875119909119910119896|119896minus1 = 119864 [x119896x119879

119896] minus x119896|119896minus1y

119879

119896|119896minus1(13)

is the cross-covariance matrix between the states and themeasurements Hence the states and the corresponding errorcovariance matrix are updated by calculating the innovationz119896 minus z119896|119896minus1 and the Kalman gain K119866119894

x119896|119896 = x119896|119896minus1 + K119866119896 (y119896 minus y119896|119896minus1)

119875119909119909119896|119896 = 119875119909119909119896|119896minus1 minus K119866119896119875119910119910119896|119896minus1K119879

119866119896

K119866119896 = 119875119909119910119896|119896minus1119875minus1

119910119910119896|119896minus1

(14)

The next subsection briefly describes the computation ofhigh-dimensional integrals present in the equations above

313 Computation of Integrals Using Spherical-Radial Cuba-ture Points In order to determine the expectations in (10)using a numerical integration method a spherical-radialcubature rule is applied This method calculates the cubaturepoints X119895119896minus1|119896minus1 as follows [36]

X119895119896minus1|119896minus1 = U119896minus1|119896minus1120577119895 + x119896minus1|119896minus1 (15)

where 120577119895 =radicℓ2[1]119895 119895 = 1 ℓ ℓ = 2119873 denotes the total

number of cubature points andU119896minus1|119896minus1 stands for the squareroot of the error covariance matrix that is

P119909119909119896minus1|119896minus1 = U119896minus1|119896minus1U119879

119896minus1|119896minus1 (16)

The cubature points are updated via the state equation

Xlowast119895119896|119896minus1

= 119892 (X119895119896minus1|119896minus1) (17)

The propagated cubature points yield the state and errorcovariance estimates

x119896|119896minus1 =1

sum

119895=1

Xlowast119895119896|119896minus1

P119909119909119896|119896minus1 =1

sum

119895=1

Xlowast119895119896|119896minus1

Xlowast119879119895119896|119896minus1

minus x119896|119896minus1x119879

119896|119896minus1+Q119896minus1

(18)

The integrals in (11) and (14) can be evaluated in a similarmanner The next subsection explains the estimation ofparameters in the system

32 Estimation of Sparse Parameters Using Kalman Filter Thestate estimates are obtained using the CKF as described inthe previous subsection In order to estimate the unknownparameters in the system model one of the most commonlyused methods involves stacking the parameters with thestates and estimating them together The estimation processperformed in this manner is called joint estimation Anothermethod for the estimation of parameters consists of a two-step recursive process which is termed dual estimation Thisprocess estimates the states in the first step and with theassumption that states are known parameters are estimatedin the second step These steps are repeated until the algo-rithm converges to the true values or until the amount ofavailable observations is exhausted This paper makes use ofthe latter technique

The vector b as defined in (6) is assumed to be evolvingas a Gauss-Markov model As discussed previously the statesare assumed to be known at this step The system evolutionequations can therefore be expressed as

b119896 = b119896minus1 + 120578119896minus1

y119896 = R119896b119896 + e119896(19)

where 120578119896stands for the iid Gaussian noise and R119896 is as

defined in (8) It is observed that (19) is a system of linearequations with additive Gaussian noise and therefore theKalman filter is the optimal choice for the estimation of

Advances in Bioinformatics 5

parameter vector The standard predict and update stepsinvolved in Kalman filter are summarized as follows

b119896|119896minus1 = b119896minus1|119896minus1 + 120578119896

P119887119887119896|119896minus1 = P119887119887119896minus1|119896minus1 + Σ120578119896

u119896 = y119896 minus R119891119896b119896

K119866 = P119887119887119896|119896minus1R119879

119891119896(R119891119896P119887119887119896|119896minus1R

119879

119891119896+ 120590

2

119890Iminus1)

b119896|119896 = b119896|119896minus1 + K119866u119896

P119887119887119896|119896 = (I minus K119866R119891119896)P119887119887119896|119896minus1

(20)

whereK119866 denotes the Kalman gain andP represents the errorcovariance matrix

The Kalman filter algorithm is based on an 1198972-normminimization criterion As the gene networks are known tobe highly sparse the parameter vector is expected to haveonly a few nonzero values A more accurate approach forestimating such a vector would be to introduce an additionalconstraint on its 1198971-normwhich is the core idea in compressedsensing [38 44] Such an 1198971-norm constraint provides aunique solution to the underdetermined set of equations [45]Therefore instead of a simple 1198972 norm minimization thefollowing constrained optimization problem is considered

minb119896

10038171003817100381710038171003817

b119896 minus b11989610038171003817100381710038171003817

2

2st 1003817

1003817100381710038171003817

b11989610038171003817100381710038171003817

le 120598 (21)

The importance of this constraint can be judged by the factthat without it the system would be rendered unidentifiable[43]

The problem (21) can be solved using a pseudomeasure-ment (PM) method which incorporates the inequality con-straint (21) in the filtering process by assuming an artificialmeasurement b1198961 minus 120598 = 0 This is concisely expressed as

0 = Rb119896 minus 120598 R120591 = [sign (b120591 (1)) sign (

b120591 (119873))]

(22)

The value of the covariance matrix Σ120598 = 120590

2

120598I of the pseudo-

noise 120598 is selected in a similar manner as the process noisecovariance in the EKF algorithm However it is found thatlarge values of variances that is 1205902

120598ge 100 prove sufficient in

most cases [38] Further details on selecting these parameterscan be found in [38 46] The PM method solves (21) in arecursive manner for119870120591 iterations using the following steps

K120591119866= P120591R

119879

120591(R120591P120591R

119879

120591+ Σ120598)

minus1

b120591+1 = (I minus K120591119866R120591) b120591

P120591+1 = (I minus K120591119866R120591)P120591

(23)

At each 119896th time instant P119887119887119896|119896 and b119896|119896 obtained from(20) are considered as initial values that is b1 =

b119896|119896 andP1 = P119887119887119896|119896 which is the error covariancematrixThe value of

(1) Input time series data set y(2) Initialize 119868 119870 1206010 x0(3) for 119896 = 1 119870 do(4) Find the state estimates using CKF following the time

and measurement update steps in Section 3(5) Estimate parameters b119896 from x119896 and y119896 using (20)(6) for 120591 = 1 119870120591 do(7) Update the parameters b119896 using (23)(8) end for(9) end for(10) return

Algorithm 1 Network inference CKFS

119870120591 is equal to the number of constraints that is the expectednumber of nonzero b119898119899s in the system Possible ways forcalculating 119870120591 include minimum description length (MDL)principle and Bayesian information criterion (BIC)

33 Inference Algorithm The network inference algorithmis summarized in Algorithm 1 The algorithm consists of arecursive processwhich repeats itself for the number of obser-vations present in the time-series data For each time samplethe state estimate is obtained using the CKF and the parame-ter estimate is obtained using the KF Since the parameters areexpected to be sparse the estimates are then refined furtherusing the CSKF algorithm This iterative process results ina simple and accurate algorithm for gene network inferencewhile considering a complex nonlinear model

4 Crameacuter-Rao Bound

Theperformance of an estimator can be judged by comparingit with theoretical lower bounds proposed in parameterestimation theoryThe CRB establishes a lower bound on theMSE of an unbiased estimator [47] In particular the CRBstates that the covariance matrix of the estimator b is lowerbounded by

E [(b minus b) (b minus b)

119879

] ⪰ [I (b)]minus1 (24)

where the matrix inequality ⪰ is to be interpreted in thesemidefinite sense and I(b) is the Fisher information matrix(FIM)

I (b) = E[(

120597 ln119891 (y | b)120597b

)(

120597 ln119891 (y | b)120597b

)

119879

] (25)

The CRB for gene network inference can be calculated asfollows By stacking all the observations for 119896 = 1 119870 (7)can be written compactly in the matrix form

y = Rb + e (26)

6 Advances in Bioinformatics

where y = [y1198791 y119879

119870]

119879 R = [R1198791 R119879

119870]

119879 and e = [e1198791

e119879119870]

119879 The PDF 119901(y | b) is expressed as

119901 (y | b) = 119862 exp(minus

(y minus Rb)119879 (y minus Rb)2120590

2119890

) (27)

where 119862 is a constant The derivative of ln119901(y | b) can beexpressed as

120597 ln119901 (y | b)120597b

= minus

120597

120597b[

(y minus Rb)119879 (y minus Rb)120590

2119890

]

=

R119879y minus R119879Rb120590

2119890

(28)

It now follows that

(

120597 ln119901 (y | b)120597b

)(

120597 ln119901 (y | b)120597b

)

119879

=

R119879 (y minus Rb) (y minus Rb)119879R120590

4119890

(29)

By taking the expectation of (29) the FIM in (25) is given by

I (b) =

R119879R120590

2119890

(30)

The inverse of the FIM in (30) can be used to place a lowerbound on the estimation error of the parameter vector bFigure 2 shows the comparison of MSE of CKFS algorithmwith CRB as a function of number of samples 119870 for onerepresentative gene from the eight-gene network consideredin Section 51 It is observed that the MSE of the estimatedparameters decreases with increasing number of samples

5 Results and Discussion

The simulation results of the CKFS algorithm are discussed inthis section The performance is first tested on synthetic dataobtained from randomly generated Boolean networks undervarious scenarios and performance metrics The algorithmis then assessed on the DREAM4 networks and the IRMAnetwork

51 Synthetic Data Time-series data is produced from ran-domly generated Boolean networks using the system model(3) and (5) Two scenarios are considered for this purpose

First the comparison is performed by varying the num-ber of sample size while keeping the network size fixed Thegene network consists of 8 genes and 20 vertices In termsof network estimation if the algorithm predicts an edgebetween two nodes which may not be present in reality anerror referred to as false alarm error (F) is said to haveoccurred Another situation is the indication of the absenceof a vertex in the graph which in fact is present in the realnetwork This kind of error is termed missed detection (M)The summation of these two errors normalized over the total

4 6 8 10 12 14 16Sample size

CRBCKFS

10minus7

10minus6

10minus5

10minus4

10minus3

10minus2

10minus1

100

log10

MSE

Figure 2 Cramer-Rao bound on the estimation of parameters TheMSE for one of the representative 120579 is shown here for a networkconsisting of 8 vertices

number of vertices in the network yields the Hammingdistance It is also important to consider the probabilityof predicting the true connections correctly which will beassessed by the true connections (T) metric An algorithmwith low Hamming distance and small false alarm erroris particularly desirable as predicting an edge erroneouslycan be troublesome for biologists True connections indicatethe reliability of the predictions Figure 3 illustrates theperformance of the CKFS algorithm and that of the EKFalgorithm proposed in [34] in terms of the metrics describedabove It is important to mention here that the same systemmodel is assumed by both CKFS and EKF algorithms forthe purpose of this simulation These metrics are the sameas those used in [15] The variances of both the system andmeasurement noises 1205902

119908and 120590

2

V respectively are taken to be10

minus5 in all the simulations and are assumed to be knownIt is noticed that EKF has a slightly lower false alarm ratewhen the number of samples is small however as thenumber of samples increases CKFS yields a lower false alarmerror The Hamming distance for CKFS is also smaller thanEKF indicating lesser cumulative error True connectionsshow a consistent behavior for the two algorithms when thenumber of samples is increased where CKFS is able to predictconnections more accurately These experiments show thesuperiority of CKFS in terms of lower error rate

To obtain a more rigorous evaluation the performance ofalgorithms is then compared in a scenario which considersthe sample size to be fixed and assumes networks of differentsizes The receiver operating characteristic (ROC) curves areplotted as performance measures A higher area under theROC curve (AUROC) shows more true positives for a givenfalse positive and therefore indicates better classificationThe performance of CKFS(119873 119864119870) and EKF(119873 119864119870) isshown in Figure 4 where 119873 stands for the number of nodes

Advances in Bioinformatics 7

5 10 15 20 25 30 35 4002

022

024

026

028

03

032

034

036

038

Sample size

False

alar

m er

rors

EKFCKFS

(a)

035

04

045

05

055

06

065

07

Ham

min

g di

stanc

e

5 10 15 20 25 30 35 40Sample size

EKFCKFS

(b)

5 10 15 20 25 30 35 40045

05

055

06

065

07

075

08

Sample size

True

conn

ectio

ns

EKFCKFS

(c)

Figure 3 (a) (b) and (c) False alarm errors Hamming distance and true connections The synthetic networks consist of 8 vertices and 20edges The metric is normalized over the number of edges CKFS gives lower error and predicts more true connections with the increase inthe sample size of data

119864 represents the number of edges and 119870 denotes the timepoints It is observed that the CKFS exhibits superior perfor-mance than the EKF for networks of different sizes

The complexity of the two algorithms is compared forsynthetically generated networks with number of genes equalto 10 20 30 and 40The sample size is kept to 50 time pointsfor each of these networks and the run time for EKF andCKFS algorithms is calculated as shown in Table 1 It is notedthat EKF is faster for smaller network sizes but as the networksize increases the run time gets much larger than that forCKFS The main reason for this is that EKF [34] estimatesthe states and parameters by stacking them together whichrequires large-sized matrix multiplications at each iteration

The benefit associated with performing dual estimation asin CKFS is that the parameters are estimated separatelyfrom the states Since the system is linear and one-to-onefor parameters inversion of much smaller matrices can beperformed reducing the computational complexity of CKFSalgorithm CKFS is therefore particularly attractive for large-sized networks

52 DREAM4Gene Networks Several in silico networks havebeen produced in order to benchmark the performanceof gene network inference algorithms dialogue on reverseengineering assessment and methods (DREAM) in siliconetworks serve as one of the popular methods used for this

8 Advances in Bioinformatics

0 02 04 06 08 10

01

02

03

04

05

06

07

08

09

1

False positive rate

True

pos

itive

rate

ROC curve

CKFSEKF

(a)

0 02 04 06 08 10

01

02

03

04

05

06

07

08

09

1

False positive rate

True

pos

itive

rate

ROC curve

CKFSEKF

(b)

0 02 04 06 08 10

01

02

03

04

05

06

07

08

09

1

False positive rate

True

pos

itive

rate

ROC curve

CKFSEKF

(c)

Figure 4 ROC curves for the performance of CKFS and EKF using synthetic data (119873119864119870) (a) (b) and (c) (5 10 20) (10 12 20) and(15 19 20) The area under the ROC curve for CKFS is more than that for EKF for various sized networks

Table 1 Run time in seconds for EKF and CKFS algorithms forvarying network sizes for synthetically generated data The numberof sample points is fixed to 50

Number of genes 10 20 30 40EKF 016 19 165 84CKFS 12 43 115 241

purpose [39 48] In this section the performance of theCKFS algorithm is evaluated using the 10-gene and 100-gene networks released online by the DREAM4 challenge

Five networks are produced using the known GRNs ofEscherichia coli and Saccharomyces cerevisiae The data setsfor each of 10-gene network consists of 21 data points forfive different perturbations The inference is performed byusing all the perturbations The 100-gene network consistsof data sets for ten perturbations AUROC and area underthe precision-recall curve (AUPR) are calculated for the fivenetworks of both the data sets and shown in Tables 2 and 3respectively The quantities precision and recall are definedas 119875 = 119879(119879 + 119865) and 119877 = 119879(119879 + 119872) respectively Forcomparison purposes the values of the two quantities fortime-series network identification (TSNI) algorithm that

Advances in Bioinformatics 9

GAL80 GAL4

SW15 CBF1

ASH1

(a)

GAL80 GAL4

SW15 CBF1

ASH1

(b)

GAL80 GAL4

SW15 CBF1

ASH1

(c)

Figure 5 The inferred IRMA networks (a) (b) and (c) Gold standard inferred network using CKFS and inferred network using ODE[39 40] Black arrows indicate true connections blue arrows indicate the edges that are correct but their directions are reversed and redarrows indicate false positives

Table 2 Area under the ROC curve (AUROC) and area under the PR curve (AUPR) for DREAM4 10-gene networks for the five differentnetworks

Algorithm Network 1 Network 2 Network 3 Network 4 Network 5ODE [39] 062 (027) 063 (032) 058 (021) 063 (023) 068 (025)CKFS 063 (040) 067 (050) 072 (050) 075 (049) 081 (042)Random [39] 055 (018) 055 (019) 055 (017) 057 (017) 056 (016)

Table 3 Area under the ROC curve (AUROC) and area under the PR curve (AUPR) for DREAM4 100-gene networks for the five differentnetworks

Algorithm Network 1 Network 2 Network 3 Network 4 Network 5ODE [39] 055 (002) 055 (003) 060 (003) 054 (002) 059 (003)CKFS 067 (013) 057 (008) 060 (010) 062 (010) 060 (007)Random [39] 050 (0002) 050 (0002) 050 (0002) 050 (0002) 050 (0002)

exploits ordinary differential equations are also given [39]The CKFS algorithm is found to perform significantly betterthan the TSNI algorithm

53 IRMA Gene Network In addition to synthetic data it isimperative to test the algorithms using real biological dataIn this subsection the performance of the CKFS algorithm isassessed using the in vivo reverse-engineering and modelingassessment (IRMA) network [40] This network consists offive genes Galactose activates the gene expression in thenetwork whereas glucose deactivates it The cells are grownin the presence of galactose and then switched to glucoseto obtain the switch-off data which represents the expressivesamples at 21 time points The switch-on data consists of 16sample points and is obtained by growing the cells in a glucosemedium and then changing to galactose The system andmeasurement noise variances for the CKFS are assumed tobe identical as in the previous simulations Figure 5 shows theinferred network the gold standard and the network inferredusing TSNI It is observed that the CKFS algorithm succeeds

to predict most of the interactions while giving lower falsepositives

6 Conclusions

This paper presents a novel algorithm for inferring generegulatory networks from time-series data Gene regulationis assumed to follow a nonlinear state evolution model Theparameters of the system which indicate the inhibitory orexcitatory relationships between the genes are estimatedusing compressed sensing-based Kalman filtering The spar-sity constraint on the parameters is crucial because the genesare known to interact with few other genes only The use ofCKF and the dual estimation of states and parameters rendersthe algorithm computationally efficient The performance ofCKFS is evaluated for synthetic data for different networksizes as well as varying sample points ROC curves Hammingdistance and true positives are used for comparing theaccuracy of inferred network with EKF It is observed thatCKFS outperforms the EKF algorithm In addition CKFS

10 Advances in Bioinformatics

gives advantages over EKF in terms of smaller run time forlarge networks The Cramer-Rao lower bound is also deter-mined for the parameters of the model and compared withthe MSE performance of the proposed algorithm Assess-ment using DREAM4 10-gene and 100-gene networks andIRMA network data corroborates the superior performanceof CKFS Future research directions include incorporatingthe estimation of model order in the network inferencealgorithm

Acknowledgments

This work was supported by USNational Science Foundation(NSF) Grant 0915444 and QNRF-NPRP Grant 09-874-3-235 The material in this paper was presented in part at theIEEE InternationalWorkshop on Genomic Signal Processingand Statistics (GENSIPS) San Antonio TX USA December2011

References

[1] X Zhou X Wang and E R Dougherty Genomic NetworksStatistical Inference from Microarray Data John Wiley amp SonsNew York NY USA 2006

[2] H Kitano ldquoComputational systems biologyrdquo Nature vol 420pp 206ndash210 2002

[3] X Zhou and S T CWongComputational Systems Bioinformat-ics World Scientific River Edge NJ USA 2008

[4] X Cai and X Wang ldquoStochastic modeling and simulation ofgene networksrdquo IEEE Signal Processing Magazine vol 24 no 1pp 27ndash36 2007

[5] D Yue J Meng M Lu C L P Chen M Guo and Y HuangldquoUnderstanding micro-RNA regulation a computational per-spectiverdquo IEEE Signal ProcessingMagazine vol 29 no 1 pp 77ndash88 2012

[6] R Pal S Bhattacharya and M U Caglar ldquoRobust approachesfor genetic regulatory network modeling and intervention areview of recent advancesrdquo IEEE Signal Processing Magazinevol 29 no 1 pp 66ndash76 2012

[7] H Hache H Lehrach and R Herwig ldquoReverse engineering ofgene regulatory networks a comparative studyrdquo Eurasip Journalon Bioinformatics and Systems Biology vol 2009 Article ID617281 2009

[8] T Schlitt and A Brazma ldquoCurrent approaches to gene regula-tory networkmodellingrdquo BMC Bioinformatics vol 8 no 6 p 92007

[9] H D Jong ldquoModeling and simulation of genetic regulatoy sys-tems a literature reviewrdquo Journal of Computational Biology vol9 no 1 pp 67ndash103 2002

[10] I Nachman A Regev and N Friedman ldquoInferring quantitativemodels of regulatory networks from expression datardquo Bioinfor-matics vol 20 no 1 pp i248ndashi256 2004

[11] C D Giurcaneanu I Tabus and J Astola ldquoClustering timeseries gene expression data based on sum-of-exponentials fit-tingrdquo EURASIP Journal on Advances in Signal Processing vol2005 no 8 Article ID 358568 pp 1159ndash1173 2005

[12] C D Giurcaneanu I Tabus J Astola J Ollila and M VihinenldquoFast iterative gene clustering based on information theoreticcriteria for selecting the cluster structurerdquo Journal of Computa-tional Biology vol 11 no 4 pp 660ndash682 2004

[13] X Cai andG B Giannakis ldquoIdentifying differentially expressedgenes in microarray experiments with model-based varianceestimationrdquo IEEE Transactions on Signal Processing vol 54 no6 pp 2418ndash2426 2006

[14] X Zhou XWang and E R Dougherty ldquoGene clustering basedon cluster-wide mutual informationrdquo Journal of ComputationalBiology vol 11 no 1 pp 151ndash165 2004

[15] W Zhao E Serpedin and E R Dougherty ldquoInferring connec-tivity of genetic regulatory networks using informationtheoreticcriteriardquo IEEEACMTransactions onComputational Biology andBioinformatics vol 5 no 2 pp 262ndash274 2008

[16] J Dougherty I Tabus and J Astola ldquoInference of gene reg-ulatory networks based on a universal minimum descriptionlengthrdquo Eurasip Journal on Bioinformatics and Systems Biologyvol 2008 Article ID 482090 2008

[17] L Qian H Wang and E R Dougherty ldquoInference of noisynonlinear differential equation models for gene regulatory net-works using genetic programming and Kalman filteringrdquo IEEETransactions on Signal Processing vol 56 no 7 pp 3327ndash33392008

[18] W Zhao E Serpedin and E R Dougherty ldquoInferring gene reg-ulatory networks from time series data using the minimumdescription length principlerdquo Bioinformatics vol 22 no 17 pp2129ndash2135 2006

[19] X Zhou X Wang R Pal I Ivanov M Bittner and E RDougherty ldquoA Bayesian connectivity-based approach to con-structing probabilistic gene regulatory networksrdquo Bioinformat-ics vol 20 no 17 pp 2918ndash2927 2004

[20] J Meng M Lu Y Chen S-J Gao and Y Huang ldquoRobust infer-ence of the context specific structure and temporal dynamics ofgene regulatory networkrdquo BMC Genomics vol 11 no 3 p S112010

[21] Y Zhang Z Deng H Jiang and P Jia ldquoInferring gene reg-ulatory networks from multiple data sources via a dynamicBayesian network with structural emrdquo in DILS S C Boulakiaand V Tannen Eds vol 4544 of Lecture Notes in ComputerScience pp 204ndash214 Springer New York NY USA 2007

[22] K Murphy and S Mian Modeling gene expression data usingdynamic Bayesian networks University of California BerkeleyCalif USA 2001

[23] H Liu D Yue L Zhang Y Chen S J Gao and Y Huang ldquoABayesian approach for identifying miRNA targets by combin-ing sequence prediction and gene expression profilingrdquo BMCGenomics vol 11 no 3 p S12 2010

[24] Y Huang J Wang J Zhang M Sanchez and Y WangldquoBayesian inference of genetic regulatory networks from timeseries microarray data using dynamic Bayesian networksrdquoJournal of Multimedia vol 2 no 3 pp 46ndash56 2007

[25] B-E Perrin L Ralaivola A Mazurie S Bottani J Mallet andF DrsquoAlche-Buc ldquoGene networks inference using dynamicBayesian networksrdquoBioinformatics vol 19 no 2 pp ii138ndashii1482003

[26] C Rangel D LWild F Falciani Z Ghahramani and A GaibaldquoAmodelling biological responses using gene expression profil-ing and linear dynamical systemsrdquo Bioinformatics pp 349ndash3562005

[27] MQuach N Brunel and F drsquoAlch Buc ldquoEstimating parametersand hidden variables in non-linear state-space models based onODEs for biological networks inferencerdquoBioinformatics vol 23no 23 pp 3209ndash3216 2007

[28] F-X Wu W-J Zhang and A J Kusalik ldquoModeling geneexpression from microarray expression data with state-space

Advances in Bioinformatics 11

equationsrdquo in Pacific Symposium on Biocomputing R B Alt-man A K Dunker L Hunter T A Jung and T E Klein Edspp 581ndash592 World Scientific River Edge NJ USA 2004

[29] R Yamaguchi S Yoshida S Imoto T Higuchi and S MiyanoldquoFinding module-based gene networks with state-spacemodelsmining high-dimensional and short time-course geneexpression datardquo IEEE Signal Processing Magazine vol 24 no1 pp 37ndash46 2007

[30] O Hirose R Yoshida S Imoto et al ldquoStatistical inferenceof transcriptional module-based gene networks from timecourse gene expression profiles by using state space modelsrdquoBioinformatics vol 24 no 7 pp 932ndash942 2008

[31] J Angus M Beal J Li C Rangel and D Wild ldquoInferringtranscriptional networks using prior biological knowledge andconstrained state-space modelsrdquo in Learning and Inference inComputational Systems Biology N Lawrence M GirolamiM Rattray and G Sanguinetti Eds pp 117ndash152 MIT PressCambridge UK 2010

[32] C Rangel J Angus Z Ghahramani et al ldquoModeling T-cell acti-vation using gene expression profiling and state-space modelsrdquoBioinformatics vol 20 no 9 pp 1361ndash1372 2004

[33] A Noor E SerpedinMN Nounou andHNNounou ldquoInfer-ring gene regulatory networks via nonlinear state-space modelsand exploiting sparsityrdquo IEEEACM Transactions on Computa-tional Biology and Bioinformatics vol 9 no 4 pp 1203ndash12112012

[34] Z Wang X Liu Y Liu J Liang and V Vinciotti ldquoAn extendedkalman filtering approach tomodeling nonlinear dynamic generegulatory networks via short gene expression time seriesrdquoIEEEACM Transactions on Computational Biology and Bioin-formatics vol 6 no 3 pp 410ndash419 2009

[35] A Noor E Serpedin M Nounou H Nounou N Mohamedand L Chouchane ldquoAn overview of the statistical methodsused for inferring gene regulatory networks and proteinproteininteraction networksrdquo Advances in Bioinformatics vol 2013Article ID 953814 12 pages 2013

[36] I Arasaratnam and S Haykin ldquoCubature kalman filtersrdquo IEEETransactions on Automatic Control vol 54 no 6 pp 1254ndash12692009

[37] A Noor E Serpedin M N Nounou and H N Nounou ldquoAcubature Kalman filter approach for inferring gene regulatorynetworks using time series datardquo in Proceedings of the IEEEInternationalWorkshop onGenomic Signal Processing and Statis-tics (GENSIPS rsquo11) pp 25ndash28 2011

[38] A Carmi P Gurfil and D Kanevsky ldquoMethods for sparsesignal recovery using kalman filtering with embedded rseudo-measurement norms and quasi-normsrdquo IEEE Transactions onSignal Processing vol 58 no 4 pp 2405ndash2409 2010

[39] C A Penfold andD LWild ldquoHow to infer gene networks fromexpression profiles revisitedrdquo Interface Focus pp 857ndash870 2011

[40] I Cantone L Marucci F Iorio et al ldquoA yeast synthetic networkfor in vivo assessment of reverse-engineering and modelingapproachesrdquo Cell vol 137 no 1 pp 172ndash181 2009

[41] Y Huang I M Tienda-Luna and Y Wang ldquoReverse engineer-ing gene regulatory networks a survey of statistical modelsrdquoIEEE Signal Processing Magazine vol 26 no 1 pp 76ndash97 2009

[42] Z Wang F Yang D W C Ho S Swift A Tucker and X LiuldquoStochastic dynamic modeling of short gene expression time-series datardquo IEEE Transactions on Nanobioscience vol 7 no 1pp 44ndash55 2008

[43] H Xiong and Y Choe ldquoStructural systems identification ofgenetic regulatory networksrdquo Bioinformatics vol 24 no 4 pp553ndash560 2008

[44] R Tibshirani ldquoRegression shrinkage and selection via the lassordquoJournal of the Royal Statistical Society B vol 58 no 1 pp 267ndash288 1996

[45] E J Cands andT Tao ldquoDecoding by linear programmingrdquo IEEETransactions on Information Theory vol 51 no 12 pp 4203ndash4215 2005

[46] J D Geeter H V Brussel and J D Schutter ldquoA smoothly con-strained Kalman filterrdquo IEEE Transactions on Pattern Analysisand Machine Intelligence vol 19 no 10 pp 1171ndash1177 1997

[47] S M Kay Fundamentals of Statistical Signal Processing Estima-tion Theory Prentice-Hall New York NY USA 1993

[48] httpwikic2b2columbiaedudream

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 3: Research Article Reverse Engineering Sparse Gene ...downloads.hindawi.com/journals/abi/2013/205763.pdf · compressed sensing-based Kalman lter is used for the estimation of system

Advances in Bioinformatics 3

the gene interactions effectively the following nonlinear stateevolution model is assumed [33 34]

119909119896119899 =

119873

sum

119898=1

119887119899119898119891 (119909119896minus1119898) + 119908119896119899

119896 = 1 119870 119899 = 1 119873

(3)

where119873 is the total number of genes in the network and 119891(sdot)

is the sigmoid squash function

119891 (119909119896minus1119898) =

1

1 + 119890

minus119909119896minus1119898 (4)

This particular choice for the nonlinear function ensures thatthe conditional distribution of the states remains Gaussian[41] The multiplicative constants 119887119899119898 quantify the positive ornegative relations between various genes in the network Apositive value of 119887119899119898 implies that the 119898th gene is activatingthe 119899th gene whereas a negative value implies repression[34 42] The absolute value of these parameters indicates thestrength of interaction

The model given in (3) and (4) in the absence of any con-straints may be unidentifiable and may result into overfittedsolutions [43] Assumptions on network structures are there-fore necessary to obtain a connectivity matrix that agreeswith the biological knowledge In a gene regulatory network(GRN) the genes are known to interact with few other genesonly To this end the coefficients 119887119899119898s are estimated usingsparsity constraints as explained in the next section

A discrete linear Gaussian model for the microarray datais considered which can be expressed at the 119896th time instantas [41]

y119896 = x119896 + v119896 (5)

Stacking the unknown parameters together the parametervector to be estimated is

b ≜ [1206011 1206012 120601119873] (6)

where 120601119899 = [1198871198991 119887119899119873] Plugging the values of states from(3) into (5) it follows that

y119896 = R119896b + e119896 (7)

where

R119896 ≜[

[

[

[

f119896 0 0 00 f119896 0 00 0 d 00 0 0 f119896

]

]

]

]

(8)

f119896 ≜ [119891 (119909119896minus11) sdot sdot sdot 119891 (119909119896minus1119873)] (9)

Thus the gene network inference problem boils down to theestimation of system parameters b using the observationsy119896 where the effective noise e119896 is the sum of system andobservation noises The next section describes the proposedinference algorithm for sparse networks

CKF CSKF

119896 = 119870

Output

No

Initialize

Yes

y119896

x119896

b119896b0 x0

Input time-series data y

Figure 1 Block diagram of network inference methodology CKFS

3 Method

In this section the methodology proposed to infer the sys-tem parameters in (3) is described The proposed cubatureKalman filter with sparsity constraints (CKFS) approach issuccinctly illustrated in Figure 1 The specific details of thisalgorithm are as next presented

31 Cubature Kalman Filter Kalman filter is a Bayesian filterwhich provides the optimal solution to a general linear statespace inference problemdepicted by (1) and (2) and assumes arecursive predictive-update process The underlying assump-tion of Gaussianity for the predictive and the likelihooddensities simplifies the Kalman filter algorithm to a two-stepprocess consisting of prediction and update of the mean andcovariance of the hidden states However the presence ofnonlinear functions in the state and measurement equationsrequires calculation ofmultidimensional integrals of the formnonlinear function times Gaussian density [36] which in generalis computationally prohibitive Several solutions to this prob-lem have been proposed including the EKF which linearizesthe nonlinear function by taking its first-order Taylor approx-imation and the unscented Kalman filter (UKF) whichapproximates the probability density function (PDF) using anonlinear transformation of the random variable Recently anew approach CKF has been proposed which evaluates theintegrals numerically using spherical-radial cubature rules[36]

The next two subsections briefly explain the working ofBayesian filtering and the CKF solution for the nonlinearmultidimensional integrals

311 TimeUpdate Let the observations up to the time instant119896 be denoted by d119896 that is d119896 ≜ [y119879

1 y119879

119896]

119879 In the predic-tion phase also called the time update of the Bayesian filter

4 Advances in Bioinformatics

themean and covariance of theGaussian posterior density arecomputed as follows

x119896|119896minus1 = 119864 [f (x119896minus1) | d119896minus1]

P119909119909119896|119896minus1 = 119864 [f (x119896) f119879(119909119896)] minus x119896|119896minus1x

119879

119896|119896minus1+Q119896minus1

(10)

where 119864 denotes the expectation operator and x119896minus1 is nor-mally distributed with parameters (x119896minus1|119896minus1P119909119909119896minus1|119896minus1) Thethird equality is a consequence of the zero-mean natureof Gaussian noise w and its independence from d119896 Theestimates x119896minus1|119896minus1 and P119909119909119896minus1|119896minus1 are assumed to be availablefrom the previous iteration Here P119909119909119896|119896minus1 is an estimate ofthe error covariance matrix

312 Measurement Update Since the measurement noiseis also Gaussian the likelihood density is given by y119896minus1 |

d119896minus1 N(z119896minus1 y119896|119896minus1P119909119909119896|119896minus1) As themeasurements becomeavailable at the 119896th time instant the mean and covariance ofthe likelihood density are calculated as follows

y119896|119896minus1 = 119864 [y119896 | d119896minus1]

P119910119910119896|119896minus1 = 119864 [x119896x119879

119896] minus y119896|119896minus1y

119879

119896|119896minus1+ S119896minus1

(11)

The updated posterior density obtained from the condi-tional joint density of states and the measurements can beexpressed as

([x119879119896y119879119896]

119879

dkminus1)

sim N((

x119896|119896minus1y119896|119896minus1

) (

P119909119909119896|119896minus1 P119909119910119896|119896minus1P119879119909119910119896|119896minus1

P119910119910119896|119896minus1))

(12)

where

119875119909119910119896|119896minus1 = 119864 [x119896x119879

119896] minus x119896|119896minus1y

119879

119896|119896minus1(13)

is the cross-covariance matrix between the states and themeasurements Hence the states and the corresponding errorcovariance matrix are updated by calculating the innovationz119896 minus z119896|119896minus1 and the Kalman gain K119866119894

x119896|119896 = x119896|119896minus1 + K119866119896 (y119896 minus y119896|119896minus1)

119875119909119909119896|119896 = 119875119909119909119896|119896minus1 minus K119866119896119875119910119910119896|119896minus1K119879

119866119896

K119866119896 = 119875119909119910119896|119896minus1119875minus1

119910119910119896|119896minus1

(14)

The next subsection briefly describes the computation ofhigh-dimensional integrals present in the equations above

313 Computation of Integrals Using Spherical-Radial Cuba-ture Points In order to determine the expectations in (10)using a numerical integration method a spherical-radialcubature rule is applied This method calculates the cubaturepoints X119895119896minus1|119896minus1 as follows [36]

X119895119896minus1|119896minus1 = U119896minus1|119896minus1120577119895 + x119896minus1|119896minus1 (15)

where 120577119895 =radicℓ2[1]119895 119895 = 1 ℓ ℓ = 2119873 denotes the total

number of cubature points andU119896minus1|119896minus1 stands for the squareroot of the error covariance matrix that is

P119909119909119896minus1|119896minus1 = U119896minus1|119896minus1U119879

119896minus1|119896minus1 (16)

The cubature points are updated via the state equation

Xlowast119895119896|119896minus1

= 119892 (X119895119896minus1|119896minus1) (17)

The propagated cubature points yield the state and errorcovariance estimates

x119896|119896minus1 =1

sum

119895=1

Xlowast119895119896|119896minus1

P119909119909119896|119896minus1 =1

sum

119895=1

Xlowast119895119896|119896minus1

Xlowast119879119895119896|119896minus1

minus x119896|119896minus1x119879

119896|119896minus1+Q119896minus1

(18)

The integrals in (11) and (14) can be evaluated in a similarmanner The next subsection explains the estimation ofparameters in the system

32 Estimation of Sparse Parameters Using Kalman Filter Thestate estimates are obtained using the CKF as described inthe previous subsection In order to estimate the unknownparameters in the system model one of the most commonlyused methods involves stacking the parameters with thestates and estimating them together The estimation processperformed in this manner is called joint estimation Anothermethod for the estimation of parameters consists of a two-step recursive process which is termed dual estimation Thisprocess estimates the states in the first step and with theassumption that states are known parameters are estimatedin the second step These steps are repeated until the algo-rithm converges to the true values or until the amount ofavailable observations is exhausted This paper makes use ofthe latter technique

The vector b as defined in (6) is assumed to be evolvingas a Gauss-Markov model As discussed previously the statesare assumed to be known at this step The system evolutionequations can therefore be expressed as

b119896 = b119896minus1 + 120578119896minus1

y119896 = R119896b119896 + e119896(19)

where 120578119896stands for the iid Gaussian noise and R119896 is as

defined in (8) It is observed that (19) is a system of linearequations with additive Gaussian noise and therefore theKalman filter is the optimal choice for the estimation of

Advances in Bioinformatics 5

parameter vector The standard predict and update stepsinvolved in Kalman filter are summarized as follows

b119896|119896minus1 = b119896minus1|119896minus1 + 120578119896

P119887119887119896|119896minus1 = P119887119887119896minus1|119896minus1 + Σ120578119896

u119896 = y119896 minus R119891119896b119896

K119866 = P119887119887119896|119896minus1R119879

119891119896(R119891119896P119887119887119896|119896minus1R

119879

119891119896+ 120590

2

119890Iminus1)

b119896|119896 = b119896|119896minus1 + K119866u119896

P119887119887119896|119896 = (I minus K119866R119891119896)P119887119887119896|119896minus1

(20)

whereK119866 denotes the Kalman gain andP represents the errorcovariance matrix

The Kalman filter algorithm is based on an 1198972-normminimization criterion As the gene networks are known tobe highly sparse the parameter vector is expected to haveonly a few nonzero values A more accurate approach forestimating such a vector would be to introduce an additionalconstraint on its 1198971-normwhich is the core idea in compressedsensing [38 44] Such an 1198971-norm constraint provides aunique solution to the underdetermined set of equations [45]Therefore instead of a simple 1198972 norm minimization thefollowing constrained optimization problem is considered

minb119896

10038171003817100381710038171003817

b119896 minus b11989610038171003817100381710038171003817

2

2st 1003817

1003817100381710038171003817

b11989610038171003817100381710038171003817

le 120598 (21)

The importance of this constraint can be judged by the factthat without it the system would be rendered unidentifiable[43]

The problem (21) can be solved using a pseudomeasure-ment (PM) method which incorporates the inequality con-straint (21) in the filtering process by assuming an artificialmeasurement b1198961 minus 120598 = 0 This is concisely expressed as

0 = Rb119896 minus 120598 R120591 = [sign (b120591 (1)) sign (

b120591 (119873))]

(22)

The value of the covariance matrix Σ120598 = 120590

2

120598I of the pseudo-

noise 120598 is selected in a similar manner as the process noisecovariance in the EKF algorithm However it is found thatlarge values of variances that is 1205902

120598ge 100 prove sufficient in

most cases [38] Further details on selecting these parameterscan be found in [38 46] The PM method solves (21) in arecursive manner for119870120591 iterations using the following steps

K120591119866= P120591R

119879

120591(R120591P120591R

119879

120591+ Σ120598)

minus1

b120591+1 = (I minus K120591119866R120591) b120591

P120591+1 = (I minus K120591119866R120591)P120591

(23)

At each 119896th time instant P119887119887119896|119896 and b119896|119896 obtained from(20) are considered as initial values that is b1 =

b119896|119896 andP1 = P119887119887119896|119896 which is the error covariancematrixThe value of

(1) Input time series data set y(2) Initialize 119868 119870 1206010 x0(3) for 119896 = 1 119870 do(4) Find the state estimates using CKF following the time

and measurement update steps in Section 3(5) Estimate parameters b119896 from x119896 and y119896 using (20)(6) for 120591 = 1 119870120591 do(7) Update the parameters b119896 using (23)(8) end for(9) end for(10) return

Algorithm 1 Network inference CKFS

119870120591 is equal to the number of constraints that is the expectednumber of nonzero b119898119899s in the system Possible ways forcalculating 119870120591 include minimum description length (MDL)principle and Bayesian information criterion (BIC)

33 Inference Algorithm The network inference algorithmis summarized in Algorithm 1 The algorithm consists of arecursive processwhich repeats itself for the number of obser-vations present in the time-series data For each time samplethe state estimate is obtained using the CKF and the parame-ter estimate is obtained using the KF Since the parameters areexpected to be sparse the estimates are then refined furtherusing the CSKF algorithm This iterative process results ina simple and accurate algorithm for gene network inferencewhile considering a complex nonlinear model

4 Crameacuter-Rao Bound

Theperformance of an estimator can be judged by comparingit with theoretical lower bounds proposed in parameterestimation theoryThe CRB establishes a lower bound on theMSE of an unbiased estimator [47] In particular the CRBstates that the covariance matrix of the estimator b is lowerbounded by

E [(b minus b) (b minus b)

119879

] ⪰ [I (b)]minus1 (24)

where the matrix inequality ⪰ is to be interpreted in thesemidefinite sense and I(b) is the Fisher information matrix(FIM)

I (b) = E[(

120597 ln119891 (y | b)120597b

)(

120597 ln119891 (y | b)120597b

)

119879

] (25)

The CRB for gene network inference can be calculated asfollows By stacking all the observations for 119896 = 1 119870 (7)can be written compactly in the matrix form

y = Rb + e (26)

6 Advances in Bioinformatics

where y = [y1198791 y119879

119870]

119879 R = [R1198791 R119879

119870]

119879 and e = [e1198791

e119879119870]

119879 The PDF 119901(y | b) is expressed as

119901 (y | b) = 119862 exp(minus

(y minus Rb)119879 (y minus Rb)2120590

2119890

) (27)

where 119862 is a constant The derivative of ln119901(y | b) can beexpressed as

120597 ln119901 (y | b)120597b

= minus

120597

120597b[

(y minus Rb)119879 (y minus Rb)120590

2119890

]

=

R119879y minus R119879Rb120590

2119890

(28)

It now follows that

(

120597 ln119901 (y | b)120597b

)(

120597 ln119901 (y | b)120597b

)

119879

=

R119879 (y minus Rb) (y minus Rb)119879R120590

4119890

(29)

By taking the expectation of (29) the FIM in (25) is given by

I (b) =

R119879R120590

2119890

(30)

The inverse of the FIM in (30) can be used to place a lowerbound on the estimation error of the parameter vector bFigure 2 shows the comparison of MSE of CKFS algorithmwith CRB as a function of number of samples 119870 for onerepresentative gene from the eight-gene network consideredin Section 51 It is observed that the MSE of the estimatedparameters decreases with increasing number of samples

5 Results and Discussion

The simulation results of the CKFS algorithm are discussed inthis section The performance is first tested on synthetic dataobtained from randomly generated Boolean networks undervarious scenarios and performance metrics The algorithmis then assessed on the DREAM4 networks and the IRMAnetwork

51 Synthetic Data Time-series data is produced from ran-domly generated Boolean networks using the system model(3) and (5) Two scenarios are considered for this purpose

First the comparison is performed by varying the num-ber of sample size while keeping the network size fixed Thegene network consists of 8 genes and 20 vertices In termsof network estimation if the algorithm predicts an edgebetween two nodes which may not be present in reality anerror referred to as false alarm error (F) is said to haveoccurred Another situation is the indication of the absenceof a vertex in the graph which in fact is present in the realnetwork This kind of error is termed missed detection (M)The summation of these two errors normalized over the total

4 6 8 10 12 14 16Sample size

CRBCKFS

10minus7

10minus6

10minus5

10minus4

10minus3

10minus2

10minus1

100

log10

MSE

Figure 2 Cramer-Rao bound on the estimation of parameters TheMSE for one of the representative 120579 is shown here for a networkconsisting of 8 vertices

number of vertices in the network yields the Hammingdistance It is also important to consider the probabilityof predicting the true connections correctly which will beassessed by the true connections (T) metric An algorithmwith low Hamming distance and small false alarm erroris particularly desirable as predicting an edge erroneouslycan be troublesome for biologists True connections indicatethe reliability of the predictions Figure 3 illustrates theperformance of the CKFS algorithm and that of the EKFalgorithm proposed in [34] in terms of the metrics describedabove It is important to mention here that the same systemmodel is assumed by both CKFS and EKF algorithms forthe purpose of this simulation These metrics are the sameas those used in [15] The variances of both the system andmeasurement noises 1205902

119908and 120590

2

V respectively are taken to be10

minus5 in all the simulations and are assumed to be knownIt is noticed that EKF has a slightly lower false alarm ratewhen the number of samples is small however as thenumber of samples increases CKFS yields a lower false alarmerror The Hamming distance for CKFS is also smaller thanEKF indicating lesser cumulative error True connectionsshow a consistent behavior for the two algorithms when thenumber of samples is increased where CKFS is able to predictconnections more accurately These experiments show thesuperiority of CKFS in terms of lower error rate

To obtain a more rigorous evaluation the performance ofalgorithms is then compared in a scenario which considersthe sample size to be fixed and assumes networks of differentsizes The receiver operating characteristic (ROC) curves areplotted as performance measures A higher area under theROC curve (AUROC) shows more true positives for a givenfalse positive and therefore indicates better classificationThe performance of CKFS(119873 119864119870) and EKF(119873 119864119870) isshown in Figure 4 where 119873 stands for the number of nodes

Advances in Bioinformatics 7

5 10 15 20 25 30 35 4002

022

024

026

028

03

032

034

036

038

Sample size

False

alar

m er

rors

EKFCKFS

(a)

035

04

045

05

055

06

065

07

Ham

min

g di

stanc

e

5 10 15 20 25 30 35 40Sample size

EKFCKFS

(b)

5 10 15 20 25 30 35 40045

05

055

06

065

07

075

08

Sample size

True

conn

ectio

ns

EKFCKFS

(c)

Figure 3 (a) (b) and (c) False alarm errors Hamming distance and true connections The synthetic networks consist of 8 vertices and 20edges The metric is normalized over the number of edges CKFS gives lower error and predicts more true connections with the increase inthe sample size of data

119864 represents the number of edges and 119870 denotes the timepoints It is observed that the CKFS exhibits superior perfor-mance than the EKF for networks of different sizes

The complexity of the two algorithms is compared forsynthetically generated networks with number of genes equalto 10 20 30 and 40The sample size is kept to 50 time pointsfor each of these networks and the run time for EKF andCKFS algorithms is calculated as shown in Table 1 It is notedthat EKF is faster for smaller network sizes but as the networksize increases the run time gets much larger than that forCKFS The main reason for this is that EKF [34] estimatesthe states and parameters by stacking them together whichrequires large-sized matrix multiplications at each iteration

The benefit associated with performing dual estimation asin CKFS is that the parameters are estimated separatelyfrom the states Since the system is linear and one-to-onefor parameters inversion of much smaller matrices can beperformed reducing the computational complexity of CKFSalgorithm CKFS is therefore particularly attractive for large-sized networks

52 DREAM4Gene Networks Several in silico networks havebeen produced in order to benchmark the performanceof gene network inference algorithms dialogue on reverseengineering assessment and methods (DREAM) in siliconetworks serve as one of the popular methods used for this

8 Advances in Bioinformatics

0 02 04 06 08 10

01

02

03

04

05

06

07

08

09

1

False positive rate

True

pos

itive

rate

ROC curve

CKFSEKF

(a)

0 02 04 06 08 10

01

02

03

04

05

06

07

08

09

1

False positive rate

True

pos

itive

rate

ROC curve

CKFSEKF

(b)

0 02 04 06 08 10

01

02

03

04

05

06

07

08

09

1

False positive rate

True

pos

itive

rate

ROC curve

CKFSEKF

(c)

Figure 4 ROC curves for the performance of CKFS and EKF using synthetic data (119873119864119870) (a) (b) and (c) (5 10 20) (10 12 20) and(15 19 20) The area under the ROC curve for CKFS is more than that for EKF for various sized networks

Table 1 Run time in seconds for EKF and CKFS algorithms forvarying network sizes for synthetically generated data The numberof sample points is fixed to 50

Number of genes 10 20 30 40EKF 016 19 165 84CKFS 12 43 115 241

purpose [39 48] In this section the performance of theCKFS algorithm is evaluated using the 10-gene and 100-gene networks released online by the DREAM4 challenge

Five networks are produced using the known GRNs ofEscherichia coli and Saccharomyces cerevisiae The data setsfor each of 10-gene network consists of 21 data points forfive different perturbations The inference is performed byusing all the perturbations The 100-gene network consistsof data sets for ten perturbations AUROC and area underthe precision-recall curve (AUPR) are calculated for the fivenetworks of both the data sets and shown in Tables 2 and 3respectively The quantities precision and recall are definedas 119875 = 119879(119879 + 119865) and 119877 = 119879(119879 + 119872) respectively Forcomparison purposes the values of the two quantities fortime-series network identification (TSNI) algorithm that

Advances in Bioinformatics 9

GAL80 GAL4

SW15 CBF1

ASH1

(a)

GAL80 GAL4

SW15 CBF1

ASH1

(b)

GAL80 GAL4

SW15 CBF1

ASH1

(c)

Figure 5 The inferred IRMA networks (a) (b) and (c) Gold standard inferred network using CKFS and inferred network using ODE[39 40] Black arrows indicate true connections blue arrows indicate the edges that are correct but their directions are reversed and redarrows indicate false positives

Table 2 Area under the ROC curve (AUROC) and area under the PR curve (AUPR) for DREAM4 10-gene networks for the five differentnetworks

Algorithm Network 1 Network 2 Network 3 Network 4 Network 5ODE [39] 062 (027) 063 (032) 058 (021) 063 (023) 068 (025)CKFS 063 (040) 067 (050) 072 (050) 075 (049) 081 (042)Random [39] 055 (018) 055 (019) 055 (017) 057 (017) 056 (016)

Table 3 Area under the ROC curve (AUROC) and area under the PR curve (AUPR) for DREAM4 100-gene networks for the five differentnetworks

Algorithm Network 1 Network 2 Network 3 Network 4 Network 5ODE [39] 055 (002) 055 (003) 060 (003) 054 (002) 059 (003)CKFS 067 (013) 057 (008) 060 (010) 062 (010) 060 (007)Random [39] 050 (0002) 050 (0002) 050 (0002) 050 (0002) 050 (0002)

exploits ordinary differential equations are also given [39]The CKFS algorithm is found to perform significantly betterthan the TSNI algorithm

53 IRMA Gene Network In addition to synthetic data it isimperative to test the algorithms using real biological dataIn this subsection the performance of the CKFS algorithm isassessed using the in vivo reverse-engineering and modelingassessment (IRMA) network [40] This network consists offive genes Galactose activates the gene expression in thenetwork whereas glucose deactivates it The cells are grownin the presence of galactose and then switched to glucoseto obtain the switch-off data which represents the expressivesamples at 21 time points The switch-on data consists of 16sample points and is obtained by growing the cells in a glucosemedium and then changing to galactose The system andmeasurement noise variances for the CKFS are assumed tobe identical as in the previous simulations Figure 5 shows theinferred network the gold standard and the network inferredusing TSNI It is observed that the CKFS algorithm succeeds

to predict most of the interactions while giving lower falsepositives

6 Conclusions

This paper presents a novel algorithm for inferring generegulatory networks from time-series data Gene regulationis assumed to follow a nonlinear state evolution model Theparameters of the system which indicate the inhibitory orexcitatory relationships between the genes are estimatedusing compressed sensing-based Kalman filtering The spar-sity constraint on the parameters is crucial because the genesare known to interact with few other genes only The use ofCKF and the dual estimation of states and parameters rendersthe algorithm computationally efficient The performance ofCKFS is evaluated for synthetic data for different networksizes as well as varying sample points ROC curves Hammingdistance and true positives are used for comparing theaccuracy of inferred network with EKF It is observed thatCKFS outperforms the EKF algorithm In addition CKFS

10 Advances in Bioinformatics

gives advantages over EKF in terms of smaller run time forlarge networks The Cramer-Rao lower bound is also deter-mined for the parameters of the model and compared withthe MSE performance of the proposed algorithm Assess-ment using DREAM4 10-gene and 100-gene networks andIRMA network data corroborates the superior performanceof CKFS Future research directions include incorporatingthe estimation of model order in the network inferencealgorithm

Acknowledgments

This work was supported by USNational Science Foundation(NSF) Grant 0915444 and QNRF-NPRP Grant 09-874-3-235 The material in this paper was presented in part at theIEEE InternationalWorkshop on Genomic Signal Processingand Statistics (GENSIPS) San Antonio TX USA December2011

References

[1] X Zhou X Wang and E R Dougherty Genomic NetworksStatistical Inference from Microarray Data John Wiley amp SonsNew York NY USA 2006

[2] H Kitano ldquoComputational systems biologyrdquo Nature vol 420pp 206ndash210 2002

[3] X Zhou and S T CWongComputational Systems Bioinformat-ics World Scientific River Edge NJ USA 2008

[4] X Cai and X Wang ldquoStochastic modeling and simulation ofgene networksrdquo IEEE Signal Processing Magazine vol 24 no 1pp 27ndash36 2007

[5] D Yue J Meng M Lu C L P Chen M Guo and Y HuangldquoUnderstanding micro-RNA regulation a computational per-spectiverdquo IEEE Signal ProcessingMagazine vol 29 no 1 pp 77ndash88 2012

[6] R Pal S Bhattacharya and M U Caglar ldquoRobust approachesfor genetic regulatory network modeling and intervention areview of recent advancesrdquo IEEE Signal Processing Magazinevol 29 no 1 pp 66ndash76 2012

[7] H Hache H Lehrach and R Herwig ldquoReverse engineering ofgene regulatory networks a comparative studyrdquo Eurasip Journalon Bioinformatics and Systems Biology vol 2009 Article ID617281 2009

[8] T Schlitt and A Brazma ldquoCurrent approaches to gene regula-tory networkmodellingrdquo BMC Bioinformatics vol 8 no 6 p 92007

[9] H D Jong ldquoModeling and simulation of genetic regulatoy sys-tems a literature reviewrdquo Journal of Computational Biology vol9 no 1 pp 67ndash103 2002

[10] I Nachman A Regev and N Friedman ldquoInferring quantitativemodels of regulatory networks from expression datardquo Bioinfor-matics vol 20 no 1 pp i248ndashi256 2004

[11] C D Giurcaneanu I Tabus and J Astola ldquoClustering timeseries gene expression data based on sum-of-exponentials fit-tingrdquo EURASIP Journal on Advances in Signal Processing vol2005 no 8 Article ID 358568 pp 1159ndash1173 2005

[12] C D Giurcaneanu I Tabus J Astola J Ollila and M VihinenldquoFast iterative gene clustering based on information theoreticcriteria for selecting the cluster structurerdquo Journal of Computa-tional Biology vol 11 no 4 pp 660ndash682 2004

[13] X Cai andG B Giannakis ldquoIdentifying differentially expressedgenes in microarray experiments with model-based varianceestimationrdquo IEEE Transactions on Signal Processing vol 54 no6 pp 2418ndash2426 2006

[14] X Zhou XWang and E R Dougherty ldquoGene clustering basedon cluster-wide mutual informationrdquo Journal of ComputationalBiology vol 11 no 1 pp 151ndash165 2004

[15] W Zhao E Serpedin and E R Dougherty ldquoInferring connec-tivity of genetic regulatory networks using informationtheoreticcriteriardquo IEEEACMTransactions onComputational Biology andBioinformatics vol 5 no 2 pp 262ndash274 2008

[16] J Dougherty I Tabus and J Astola ldquoInference of gene reg-ulatory networks based on a universal minimum descriptionlengthrdquo Eurasip Journal on Bioinformatics and Systems Biologyvol 2008 Article ID 482090 2008

[17] L Qian H Wang and E R Dougherty ldquoInference of noisynonlinear differential equation models for gene regulatory net-works using genetic programming and Kalman filteringrdquo IEEETransactions on Signal Processing vol 56 no 7 pp 3327ndash33392008

[18] W Zhao E Serpedin and E R Dougherty ldquoInferring gene reg-ulatory networks from time series data using the minimumdescription length principlerdquo Bioinformatics vol 22 no 17 pp2129ndash2135 2006

[19] X Zhou X Wang R Pal I Ivanov M Bittner and E RDougherty ldquoA Bayesian connectivity-based approach to con-structing probabilistic gene regulatory networksrdquo Bioinformat-ics vol 20 no 17 pp 2918ndash2927 2004

[20] J Meng M Lu Y Chen S-J Gao and Y Huang ldquoRobust infer-ence of the context specific structure and temporal dynamics ofgene regulatory networkrdquo BMC Genomics vol 11 no 3 p S112010

[21] Y Zhang Z Deng H Jiang and P Jia ldquoInferring gene reg-ulatory networks from multiple data sources via a dynamicBayesian network with structural emrdquo in DILS S C Boulakiaand V Tannen Eds vol 4544 of Lecture Notes in ComputerScience pp 204ndash214 Springer New York NY USA 2007

[22] K Murphy and S Mian Modeling gene expression data usingdynamic Bayesian networks University of California BerkeleyCalif USA 2001

[23] H Liu D Yue L Zhang Y Chen S J Gao and Y Huang ldquoABayesian approach for identifying miRNA targets by combin-ing sequence prediction and gene expression profilingrdquo BMCGenomics vol 11 no 3 p S12 2010

[24] Y Huang J Wang J Zhang M Sanchez and Y WangldquoBayesian inference of genetic regulatory networks from timeseries microarray data using dynamic Bayesian networksrdquoJournal of Multimedia vol 2 no 3 pp 46ndash56 2007

[25] B-E Perrin L Ralaivola A Mazurie S Bottani J Mallet andF DrsquoAlche-Buc ldquoGene networks inference using dynamicBayesian networksrdquoBioinformatics vol 19 no 2 pp ii138ndashii1482003

[26] C Rangel D LWild F Falciani Z Ghahramani and A GaibaldquoAmodelling biological responses using gene expression profil-ing and linear dynamical systemsrdquo Bioinformatics pp 349ndash3562005

[27] MQuach N Brunel and F drsquoAlch Buc ldquoEstimating parametersand hidden variables in non-linear state-space models based onODEs for biological networks inferencerdquoBioinformatics vol 23no 23 pp 3209ndash3216 2007

[28] F-X Wu W-J Zhang and A J Kusalik ldquoModeling geneexpression from microarray expression data with state-space

Advances in Bioinformatics 11

equationsrdquo in Pacific Symposium on Biocomputing R B Alt-man A K Dunker L Hunter T A Jung and T E Klein Edspp 581ndash592 World Scientific River Edge NJ USA 2004

[29] R Yamaguchi S Yoshida S Imoto T Higuchi and S MiyanoldquoFinding module-based gene networks with state-spacemodelsmining high-dimensional and short time-course geneexpression datardquo IEEE Signal Processing Magazine vol 24 no1 pp 37ndash46 2007

[30] O Hirose R Yoshida S Imoto et al ldquoStatistical inferenceof transcriptional module-based gene networks from timecourse gene expression profiles by using state space modelsrdquoBioinformatics vol 24 no 7 pp 932ndash942 2008

[31] J Angus M Beal J Li C Rangel and D Wild ldquoInferringtranscriptional networks using prior biological knowledge andconstrained state-space modelsrdquo in Learning and Inference inComputational Systems Biology N Lawrence M GirolamiM Rattray and G Sanguinetti Eds pp 117ndash152 MIT PressCambridge UK 2010

[32] C Rangel J Angus Z Ghahramani et al ldquoModeling T-cell acti-vation using gene expression profiling and state-space modelsrdquoBioinformatics vol 20 no 9 pp 1361ndash1372 2004

[33] A Noor E SerpedinMN Nounou andHNNounou ldquoInfer-ring gene regulatory networks via nonlinear state-space modelsand exploiting sparsityrdquo IEEEACM Transactions on Computa-tional Biology and Bioinformatics vol 9 no 4 pp 1203ndash12112012

[34] Z Wang X Liu Y Liu J Liang and V Vinciotti ldquoAn extendedkalman filtering approach tomodeling nonlinear dynamic generegulatory networks via short gene expression time seriesrdquoIEEEACM Transactions on Computational Biology and Bioin-formatics vol 6 no 3 pp 410ndash419 2009

[35] A Noor E Serpedin M Nounou H Nounou N Mohamedand L Chouchane ldquoAn overview of the statistical methodsused for inferring gene regulatory networks and proteinproteininteraction networksrdquo Advances in Bioinformatics vol 2013Article ID 953814 12 pages 2013

[36] I Arasaratnam and S Haykin ldquoCubature kalman filtersrdquo IEEETransactions on Automatic Control vol 54 no 6 pp 1254ndash12692009

[37] A Noor E Serpedin M N Nounou and H N Nounou ldquoAcubature Kalman filter approach for inferring gene regulatorynetworks using time series datardquo in Proceedings of the IEEEInternationalWorkshop onGenomic Signal Processing and Statis-tics (GENSIPS rsquo11) pp 25ndash28 2011

[38] A Carmi P Gurfil and D Kanevsky ldquoMethods for sparsesignal recovery using kalman filtering with embedded rseudo-measurement norms and quasi-normsrdquo IEEE Transactions onSignal Processing vol 58 no 4 pp 2405ndash2409 2010

[39] C A Penfold andD LWild ldquoHow to infer gene networks fromexpression profiles revisitedrdquo Interface Focus pp 857ndash870 2011

[40] I Cantone L Marucci F Iorio et al ldquoA yeast synthetic networkfor in vivo assessment of reverse-engineering and modelingapproachesrdquo Cell vol 137 no 1 pp 172ndash181 2009

[41] Y Huang I M Tienda-Luna and Y Wang ldquoReverse engineer-ing gene regulatory networks a survey of statistical modelsrdquoIEEE Signal Processing Magazine vol 26 no 1 pp 76ndash97 2009

[42] Z Wang F Yang D W C Ho S Swift A Tucker and X LiuldquoStochastic dynamic modeling of short gene expression time-series datardquo IEEE Transactions on Nanobioscience vol 7 no 1pp 44ndash55 2008

[43] H Xiong and Y Choe ldquoStructural systems identification ofgenetic regulatory networksrdquo Bioinformatics vol 24 no 4 pp553ndash560 2008

[44] R Tibshirani ldquoRegression shrinkage and selection via the lassordquoJournal of the Royal Statistical Society B vol 58 no 1 pp 267ndash288 1996

[45] E J Cands andT Tao ldquoDecoding by linear programmingrdquo IEEETransactions on Information Theory vol 51 no 12 pp 4203ndash4215 2005

[46] J D Geeter H V Brussel and J D Schutter ldquoA smoothly con-strained Kalman filterrdquo IEEE Transactions on Pattern Analysisand Machine Intelligence vol 19 no 10 pp 1171ndash1177 1997

[47] S M Kay Fundamentals of Statistical Signal Processing Estima-tion Theory Prentice-Hall New York NY USA 1993

[48] httpwikic2b2columbiaedudream

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 4: Research Article Reverse Engineering Sparse Gene ...downloads.hindawi.com/journals/abi/2013/205763.pdf · compressed sensing-based Kalman lter is used for the estimation of system

4 Advances in Bioinformatics

themean and covariance of theGaussian posterior density arecomputed as follows

x119896|119896minus1 = 119864 [f (x119896minus1) | d119896minus1]

P119909119909119896|119896minus1 = 119864 [f (x119896) f119879(119909119896)] minus x119896|119896minus1x

119879

119896|119896minus1+Q119896minus1

(10)

where 119864 denotes the expectation operator and x119896minus1 is nor-mally distributed with parameters (x119896minus1|119896minus1P119909119909119896minus1|119896minus1) Thethird equality is a consequence of the zero-mean natureof Gaussian noise w and its independence from d119896 Theestimates x119896minus1|119896minus1 and P119909119909119896minus1|119896minus1 are assumed to be availablefrom the previous iteration Here P119909119909119896|119896minus1 is an estimate ofthe error covariance matrix

312 Measurement Update Since the measurement noiseis also Gaussian the likelihood density is given by y119896minus1 |

d119896minus1 N(z119896minus1 y119896|119896minus1P119909119909119896|119896minus1) As themeasurements becomeavailable at the 119896th time instant the mean and covariance ofthe likelihood density are calculated as follows

y119896|119896minus1 = 119864 [y119896 | d119896minus1]

P119910119910119896|119896minus1 = 119864 [x119896x119879

119896] minus y119896|119896minus1y

119879

119896|119896minus1+ S119896minus1

(11)

The updated posterior density obtained from the condi-tional joint density of states and the measurements can beexpressed as

([x119879119896y119879119896]

119879

dkminus1)

sim N((

x119896|119896minus1y119896|119896minus1

) (

P119909119909119896|119896minus1 P119909119910119896|119896minus1P119879119909119910119896|119896minus1

P119910119910119896|119896minus1))

(12)

where

119875119909119910119896|119896minus1 = 119864 [x119896x119879

119896] minus x119896|119896minus1y

119879

119896|119896minus1(13)

is the cross-covariance matrix between the states and themeasurements Hence the states and the corresponding errorcovariance matrix are updated by calculating the innovationz119896 minus z119896|119896minus1 and the Kalman gain K119866119894

x119896|119896 = x119896|119896minus1 + K119866119896 (y119896 minus y119896|119896minus1)

119875119909119909119896|119896 = 119875119909119909119896|119896minus1 minus K119866119896119875119910119910119896|119896minus1K119879

119866119896

K119866119896 = 119875119909119910119896|119896minus1119875minus1

119910119910119896|119896minus1

(14)

The next subsection briefly describes the computation ofhigh-dimensional integrals present in the equations above

313 Computation of Integrals Using Spherical-Radial Cuba-ture Points In order to determine the expectations in (10)using a numerical integration method a spherical-radialcubature rule is applied This method calculates the cubaturepoints X119895119896minus1|119896minus1 as follows [36]

X119895119896minus1|119896minus1 = U119896minus1|119896minus1120577119895 + x119896minus1|119896minus1 (15)

where 120577119895 =radicℓ2[1]119895 119895 = 1 ℓ ℓ = 2119873 denotes the total

number of cubature points andU119896minus1|119896minus1 stands for the squareroot of the error covariance matrix that is

P119909119909119896minus1|119896minus1 = U119896minus1|119896minus1U119879

119896minus1|119896minus1 (16)

The cubature points are updated via the state equation

Xlowast119895119896|119896minus1

= 119892 (X119895119896minus1|119896minus1) (17)

The propagated cubature points yield the state and errorcovariance estimates

x119896|119896minus1 =1

sum

119895=1

Xlowast119895119896|119896minus1

P119909119909119896|119896minus1 =1

sum

119895=1

Xlowast119895119896|119896minus1

Xlowast119879119895119896|119896minus1

minus x119896|119896minus1x119879

119896|119896minus1+Q119896minus1

(18)

The integrals in (11) and (14) can be evaluated in a similarmanner The next subsection explains the estimation ofparameters in the system

32 Estimation of Sparse Parameters Using Kalman Filter Thestate estimates are obtained using the CKF as described inthe previous subsection In order to estimate the unknownparameters in the system model one of the most commonlyused methods involves stacking the parameters with thestates and estimating them together The estimation processperformed in this manner is called joint estimation Anothermethod for the estimation of parameters consists of a two-step recursive process which is termed dual estimation Thisprocess estimates the states in the first step and with theassumption that states are known parameters are estimatedin the second step These steps are repeated until the algo-rithm converges to the true values or until the amount ofavailable observations is exhausted This paper makes use ofthe latter technique

The vector b as defined in (6) is assumed to be evolvingas a Gauss-Markov model As discussed previously the statesare assumed to be known at this step The system evolutionequations can therefore be expressed as

b119896 = b119896minus1 + 120578119896minus1

y119896 = R119896b119896 + e119896(19)

where 120578119896stands for the iid Gaussian noise and R119896 is as

defined in (8) It is observed that (19) is a system of linearequations with additive Gaussian noise and therefore theKalman filter is the optimal choice for the estimation of

Advances in Bioinformatics 5

parameter vector The standard predict and update stepsinvolved in Kalman filter are summarized as follows

b119896|119896minus1 = b119896minus1|119896minus1 + 120578119896

P119887119887119896|119896minus1 = P119887119887119896minus1|119896minus1 + Σ120578119896

u119896 = y119896 minus R119891119896b119896

K119866 = P119887119887119896|119896minus1R119879

119891119896(R119891119896P119887119887119896|119896minus1R

119879

119891119896+ 120590

2

119890Iminus1)

b119896|119896 = b119896|119896minus1 + K119866u119896

P119887119887119896|119896 = (I minus K119866R119891119896)P119887119887119896|119896minus1

(20)

whereK119866 denotes the Kalman gain andP represents the errorcovariance matrix

The Kalman filter algorithm is based on an 1198972-normminimization criterion As the gene networks are known tobe highly sparse the parameter vector is expected to haveonly a few nonzero values A more accurate approach forestimating such a vector would be to introduce an additionalconstraint on its 1198971-normwhich is the core idea in compressedsensing [38 44] Such an 1198971-norm constraint provides aunique solution to the underdetermined set of equations [45]Therefore instead of a simple 1198972 norm minimization thefollowing constrained optimization problem is considered

minb119896

10038171003817100381710038171003817

b119896 minus b11989610038171003817100381710038171003817

2

2st 1003817

1003817100381710038171003817

b11989610038171003817100381710038171003817

le 120598 (21)

The importance of this constraint can be judged by the factthat without it the system would be rendered unidentifiable[43]

The problem (21) can be solved using a pseudomeasure-ment (PM) method which incorporates the inequality con-straint (21) in the filtering process by assuming an artificialmeasurement b1198961 minus 120598 = 0 This is concisely expressed as

0 = Rb119896 minus 120598 R120591 = [sign (b120591 (1)) sign (

b120591 (119873))]

(22)

The value of the covariance matrix Σ120598 = 120590

2

120598I of the pseudo-

noise 120598 is selected in a similar manner as the process noisecovariance in the EKF algorithm However it is found thatlarge values of variances that is 1205902

120598ge 100 prove sufficient in

most cases [38] Further details on selecting these parameterscan be found in [38 46] The PM method solves (21) in arecursive manner for119870120591 iterations using the following steps

K120591119866= P120591R

119879

120591(R120591P120591R

119879

120591+ Σ120598)

minus1

b120591+1 = (I minus K120591119866R120591) b120591

P120591+1 = (I minus K120591119866R120591)P120591

(23)

At each 119896th time instant P119887119887119896|119896 and b119896|119896 obtained from(20) are considered as initial values that is b1 =

b119896|119896 andP1 = P119887119887119896|119896 which is the error covariancematrixThe value of

(1) Input time series data set y(2) Initialize 119868 119870 1206010 x0(3) for 119896 = 1 119870 do(4) Find the state estimates using CKF following the time

and measurement update steps in Section 3(5) Estimate parameters b119896 from x119896 and y119896 using (20)(6) for 120591 = 1 119870120591 do(7) Update the parameters b119896 using (23)(8) end for(9) end for(10) return

Algorithm 1 Network inference CKFS

119870120591 is equal to the number of constraints that is the expectednumber of nonzero b119898119899s in the system Possible ways forcalculating 119870120591 include minimum description length (MDL)principle and Bayesian information criterion (BIC)

33 Inference Algorithm The network inference algorithmis summarized in Algorithm 1 The algorithm consists of arecursive processwhich repeats itself for the number of obser-vations present in the time-series data For each time samplethe state estimate is obtained using the CKF and the parame-ter estimate is obtained using the KF Since the parameters areexpected to be sparse the estimates are then refined furtherusing the CSKF algorithm This iterative process results ina simple and accurate algorithm for gene network inferencewhile considering a complex nonlinear model

4 Crameacuter-Rao Bound

Theperformance of an estimator can be judged by comparingit with theoretical lower bounds proposed in parameterestimation theoryThe CRB establishes a lower bound on theMSE of an unbiased estimator [47] In particular the CRBstates that the covariance matrix of the estimator b is lowerbounded by

E [(b minus b) (b minus b)

119879

] ⪰ [I (b)]minus1 (24)

where the matrix inequality ⪰ is to be interpreted in thesemidefinite sense and I(b) is the Fisher information matrix(FIM)

I (b) = E[(

120597 ln119891 (y | b)120597b

)(

120597 ln119891 (y | b)120597b

)

119879

] (25)

The CRB for gene network inference can be calculated asfollows By stacking all the observations for 119896 = 1 119870 (7)can be written compactly in the matrix form

y = Rb + e (26)

6 Advances in Bioinformatics

where y = [y1198791 y119879

119870]

119879 R = [R1198791 R119879

119870]

119879 and e = [e1198791

e119879119870]

119879 The PDF 119901(y | b) is expressed as

119901 (y | b) = 119862 exp(minus

(y minus Rb)119879 (y minus Rb)2120590

2119890

) (27)

where 119862 is a constant The derivative of ln119901(y | b) can beexpressed as

120597 ln119901 (y | b)120597b

= minus

120597

120597b[

(y minus Rb)119879 (y minus Rb)120590

2119890

]

=

R119879y minus R119879Rb120590

2119890

(28)

It now follows that

(

120597 ln119901 (y | b)120597b

)(

120597 ln119901 (y | b)120597b

)

119879

=

R119879 (y minus Rb) (y minus Rb)119879R120590

4119890

(29)

By taking the expectation of (29) the FIM in (25) is given by

I (b) =

R119879R120590

2119890

(30)

The inverse of the FIM in (30) can be used to place a lowerbound on the estimation error of the parameter vector bFigure 2 shows the comparison of MSE of CKFS algorithmwith CRB as a function of number of samples 119870 for onerepresentative gene from the eight-gene network consideredin Section 51 It is observed that the MSE of the estimatedparameters decreases with increasing number of samples

5 Results and Discussion

The simulation results of the CKFS algorithm are discussed inthis section The performance is first tested on synthetic dataobtained from randomly generated Boolean networks undervarious scenarios and performance metrics The algorithmis then assessed on the DREAM4 networks and the IRMAnetwork

51 Synthetic Data Time-series data is produced from ran-domly generated Boolean networks using the system model(3) and (5) Two scenarios are considered for this purpose

First the comparison is performed by varying the num-ber of sample size while keeping the network size fixed Thegene network consists of 8 genes and 20 vertices In termsof network estimation if the algorithm predicts an edgebetween two nodes which may not be present in reality anerror referred to as false alarm error (F) is said to haveoccurred Another situation is the indication of the absenceof a vertex in the graph which in fact is present in the realnetwork This kind of error is termed missed detection (M)The summation of these two errors normalized over the total

4 6 8 10 12 14 16Sample size

CRBCKFS

10minus7

10minus6

10minus5

10minus4

10minus3

10minus2

10minus1

100

log10

MSE

Figure 2 Cramer-Rao bound on the estimation of parameters TheMSE for one of the representative 120579 is shown here for a networkconsisting of 8 vertices

number of vertices in the network yields the Hammingdistance It is also important to consider the probabilityof predicting the true connections correctly which will beassessed by the true connections (T) metric An algorithmwith low Hamming distance and small false alarm erroris particularly desirable as predicting an edge erroneouslycan be troublesome for biologists True connections indicatethe reliability of the predictions Figure 3 illustrates theperformance of the CKFS algorithm and that of the EKFalgorithm proposed in [34] in terms of the metrics describedabove It is important to mention here that the same systemmodel is assumed by both CKFS and EKF algorithms forthe purpose of this simulation These metrics are the sameas those used in [15] The variances of both the system andmeasurement noises 1205902

119908and 120590

2

V respectively are taken to be10

minus5 in all the simulations and are assumed to be knownIt is noticed that EKF has a slightly lower false alarm ratewhen the number of samples is small however as thenumber of samples increases CKFS yields a lower false alarmerror The Hamming distance for CKFS is also smaller thanEKF indicating lesser cumulative error True connectionsshow a consistent behavior for the two algorithms when thenumber of samples is increased where CKFS is able to predictconnections more accurately These experiments show thesuperiority of CKFS in terms of lower error rate

To obtain a more rigorous evaluation the performance ofalgorithms is then compared in a scenario which considersthe sample size to be fixed and assumes networks of differentsizes The receiver operating characteristic (ROC) curves areplotted as performance measures A higher area under theROC curve (AUROC) shows more true positives for a givenfalse positive and therefore indicates better classificationThe performance of CKFS(119873 119864119870) and EKF(119873 119864119870) isshown in Figure 4 where 119873 stands for the number of nodes

Advances in Bioinformatics 7

5 10 15 20 25 30 35 4002

022

024

026

028

03

032

034

036

038

Sample size

False

alar

m er

rors

EKFCKFS

(a)

035

04

045

05

055

06

065

07

Ham

min

g di

stanc

e

5 10 15 20 25 30 35 40Sample size

EKFCKFS

(b)

5 10 15 20 25 30 35 40045

05

055

06

065

07

075

08

Sample size

True

conn

ectio

ns

EKFCKFS

(c)

Figure 3 (a) (b) and (c) False alarm errors Hamming distance and true connections The synthetic networks consist of 8 vertices and 20edges The metric is normalized over the number of edges CKFS gives lower error and predicts more true connections with the increase inthe sample size of data

119864 represents the number of edges and 119870 denotes the timepoints It is observed that the CKFS exhibits superior perfor-mance than the EKF for networks of different sizes

The complexity of the two algorithms is compared forsynthetically generated networks with number of genes equalto 10 20 30 and 40The sample size is kept to 50 time pointsfor each of these networks and the run time for EKF andCKFS algorithms is calculated as shown in Table 1 It is notedthat EKF is faster for smaller network sizes but as the networksize increases the run time gets much larger than that forCKFS The main reason for this is that EKF [34] estimatesthe states and parameters by stacking them together whichrequires large-sized matrix multiplications at each iteration

The benefit associated with performing dual estimation asin CKFS is that the parameters are estimated separatelyfrom the states Since the system is linear and one-to-onefor parameters inversion of much smaller matrices can beperformed reducing the computational complexity of CKFSalgorithm CKFS is therefore particularly attractive for large-sized networks

52 DREAM4Gene Networks Several in silico networks havebeen produced in order to benchmark the performanceof gene network inference algorithms dialogue on reverseengineering assessment and methods (DREAM) in siliconetworks serve as one of the popular methods used for this

8 Advances in Bioinformatics

0 02 04 06 08 10

01

02

03

04

05

06

07

08

09

1

False positive rate

True

pos

itive

rate

ROC curve

CKFSEKF

(a)

0 02 04 06 08 10

01

02

03

04

05

06

07

08

09

1

False positive rate

True

pos

itive

rate

ROC curve

CKFSEKF

(b)

0 02 04 06 08 10

01

02

03

04

05

06

07

08

09

1

False positive rate

True

pos

itive

rate

ROC curve

CKFSEKF

(c)

Figure 4 ROC curves for the performance of CKFS and EKF using synthetic data (119873119864119870) (a) (b) and (c) (5 10 20) (10 12 20) and(15 19 20) The area under the ROC curve for CKFS is more than that for EKF for various sized networks

Table 1 Run time in seconds for EKF and CKFS algorithms forvarying network sizes for synthetically generated data The numberof sample points is fixed to 50

Number of genes 10 20 30 40EKF 016 19 165 84CKFS 12 43 115 241

purpose [39 48] In this section the performance of theCKFS algorithm is evaluated using the 10-gene and 100-gene networks released online by the DREAM4 challenge

Five networks are produced using the known GRNs ofEscherichia coli and Saccharomyces cerevisiae The data setsfor each of 10-gene network consists of 21 data points forfive different perturbations The inference is performed byusing all the perturbations The 100-gene network consistsof data sets for ten perturbations AUROC and area underthe precision-recall curve (AUPR) are calculated for the fivenetworks of both the data sets and shown in Tables 2 and 3respectively The quantities precision and recall are definedas 119875 = 119879(119879 + 119865) and 119877 = 119879(119879 + 119872) respectively Forcomparison purposes the values of the two quantities fortime-series network identification (TSNI) algorithm that

Advances in Bioinformatics 9

GAL80 GAL4

SW15 CBF1

ASH1

(a)

GAL80 GAL4

SW15 CBF1

ASH1

(b)

GAL80 GAL4

SW15 CBF1

ASH1

(c)

Figure 5 The inferred IRMA networks (a) (b) and (c) Gold standard inferred network using CKFS and inferred network using ODE[39 40] Black arrows indicate true connections blue arrows indicate the edges that are correct but their directions are reversed and redarrows indicate false positives

Table 2 Area under the ROC curve (AUROC) and area under the PR curve (AUPR) for DREAM4 10-gene networks for the five differentnetworks

Algorithm Network 1 Network 2 Network 3 Network 4 Network 5ODE [39] 062 (027) 063 (032) 058 (021) 063 (023) 068 (025)CKFS 063 (040) 067 (050) 072 (050) 075 (049) 081 (042)Random [39] 055 (018) 055 (019) 055 (017) 057 (017) 056 (016)

Table 3 Area under the ROC curve (AUROC) and area under the PR curve (AUPR) for DREAM4 100-gene networks for the five differentnetworks

Algorithm Network 1 Network 2 Network 3 Network 4 Network 5ODE [39] 055 (002) 055 (003) 060 (003) 054 (002) 059 (003)CKFS 067 (013) 057 (008) 060 (010) 062 (010) 060 (007)Random [39] 050 (0002) 050 (0002) 050 (0002) 050 (0002) 050 (0002)

exploits ordinary differential equations are also given [39]The CKFS algorithm is found to perform significantly betterthan the TSNI algorithm

53 IRMA Gene Network In addition to synthetic data it isimperative to test the algorithms using real biological dataIn this subsection the performance of the CKFS algorithm isassessed using the in vivo reverse-engineering and modelingassessment (IRMA) network [40] This network consists offive genes Galactose activates the gene expression in thenetwork whereas glucose deactivates it The cells are grownin the presence of galactose and then switched to glucoseto obtain the switch-off data which represents the expressivesamples at 21 time points The switch-on data consists of 16sample points and is obtained by growing the cells in a glucosemedium and then changing to galactose The system andmeasurement noise variances for the CKFS are assumed tobe identical as in the previous simulations Figure 5 shows theinferred network the gold standard and the network inferredusing TSNI It is observed that the CKFS algorithm succeeds

to predict most of the interactions while giving lower falsepositives

6 Conclusions

This paper presents a novel algorithm for inferring generegulatory networks from time-series data Gene regulationis assumed to follow a nonlinear state evolution model Theparameters of the system which indicate the inhibitory orexcitatory relationships between the genes are estimatedusing compressed sensing-based Kalman filtering The spar-sity constraint on the parameters is crucial because the genesare known to interact with few other genes only The use ofCKF and the dual estimation of states and parameters rendersthe algorithm computationally efficient The performance ofCKFS is evaluated for synthetic data for different networksizes as well as varying sample points ROC curves Hammingdistance and true positives are used for comparing theaccuracy of inferred network with EKF It is observed thatCKFS outperforms the EKF algorithm In addition CKFS

10 Advances in Bioinformatics

gives advantages over EKF in terms of smaller run time forlarge networks The Cramer-Rao lower bound is also deter-mined for the parameters of the model and compared withthe MSE performance of the proposed algorithm Assess-ment using DREAM4 10-gene and 100-gene networks andIRMA network data corroborates the superior performanceof CKFS Future research directions include incorporatingthe estimation of model order in the network inferencealgorithm

Acknowledgments

This work was supported by USNational Science Foundation(NSF) Grant 0915444 and QNRF-NPRP Grant 09-874-3-235 The material in this paper was presented in part at theIEEE InternationalWorkshop on Genomic Signal Processingand Statistics (GENSIPS) San Antonio TX USA December2011

References

[1] X Zhou X Wang and E R Dougherty Genomic NetworksStatistical Inference from Microarray Data John Wiley amp SonsNew York NY USA 2006

[2] H Kitano ldquoComputational systems biologyrdquo Nature vol 420pp 206ndash210 2002

[3] X Zhou and S T CWongComputational Systems Bioinformat-ics World Scientific River Edge NJ USA 2008

[4] X Cai and X Wang ldquoStochastic modeling and simulation ofgene networksrdquo IEEE Signal Processing Magazine vol 24 no 1pp 27ndash36 2007

[5] D Yue J Meng M Lu C L P Chen M Guo and Y HuangldquoUnderstanding micro-RNA regulation a computational per-spectiverdquo IEEE Signal ProcessingMagazine vol 29 no 1 pp 77ndash88 2012

[6] R Pal S Bhattacharya and M U Caglar ldquoRobust approachesfor genetic regulatory network modeling and intervention areview of recent advancesrdquo IEEE Signal Processing Magazinevol 29 no 1 pp 66ndash76 2012

[7] H Hache H Lehrach and R Herwig ldquoReverse engineering ofgene regulatory networks a comparative studyrdquo Eurasip Journalon Bioinformatics and Systems Biology vol 2009 Article ID617281 2009

[8] T Schlitt and A Brazma ldquoCurrent approaches to gene regula-tory networkmodellingrdquo BMC Bioinformatics vol 8 no 6 p 92007

[9] H D Jong ldquoModeling and simulation of genetic regulatoy sys-tems a literature reviewrdquo Journal of Computational Biology vol9 no 1 pp 67ndash103 2002

[10] I Nachman A Regev and N Friedman ldquoInferring quantitativemodels of regulatory networks from expression datardquo Bioinfor-matics vol 20 no 1 pp i248ndashi256 2004

[11] C D Giurcaneanu I Tabus and J Astola ldquoClustering timeseries gene expression data based on sum-of-exponentials fit-tingrdquo EURASIP Journal on Advances in Signal Processing vol2005 no 8 Article ID 358568 pp 1159ndash1173 2005

[12] C D Giurcaneanu I Tabus J Astola J Ollila and M VihinenldquoFast iterative gene clustering based on information theoreticcriteria for selecting the cluster structurerdquo Journal of Computa-tional Biology vol 11 no 4 pp 660ndash682 2004

[13] X Cai andG B Giannakis ldquoIdentifying differentially expressedgenes in microarray experiments with model-based varianceestimationrdquo IEEE Transactions on Signal Processing vol 54 no6 pp 2418ndash2426 2006

[14] X Zhou XWang and E R Dougherty ldquoGene clustering basedon cluster-wide mutual informationrdquo Journal of ComputationalBiology vol 11 no 1 pp 151ndash165 2004

[15] W Zhao E Serpedin and E R Dougherty ldquoInferring connec-tivity of genetic regulatory networks using informationtheoreticcriteriardquo IEEEACMTransactions onComputational Biology andBioinformatics vol 5 no 2 pp 262ndash274 2008

[16] J Dougherty I Tabus and J Astola ldquoInference of gene reg-ulatory networks based on a universal minimum descriptionlengthrdquo Eurasip Journal on Bioinformatics and Systems Biologyvol 2008 Article ID 482090 2008

[17] L Qian H Wang and E R Dougherty ldquoInference of noisynonlinear differential equation models for gene regulatory net-works using genetic programming and Kalman filteringrdquo IEEETransactions on Signal Processing vol 56 no 7 pp 3327ndash33392008

[18] W Zhao E Serpedin and E R Dougherty ldquoInferring gene reg-ulatory networks from time series data using the minimumdescription length principlerdquo Bioinformatics vol 22 no 17 pp2129ndash2135 2006

[19] X Zhou X Wang R Pal I Ivanov M Bittner and E RDougherty ldquoA Bayesian connectivity-based approach to con-structing probabilistic gene regulatory networksrdquo Bioinformat-ics vol 20 no 17 pp 2918ndash2927 2004

[20] J Meng M Lu Y Chen S-J Gao and Y Huang ldquoRobust infer-ence of the context specific structure and temporal dynamics ofgene regulatory networkrdquo BMC Genomics vol 11 no 3 p S112010

[21] Y Zhang Z Deng H Jiang and P Jia ldquoInferring gene reg-ulatory networks from multiple data sources via a dynamicBayesian network with structural emrdquo in DILS S C Boulakiaand V Tannen Eds vol 4544 of Lecture Notes in ComputerScience pp 204ndash214 Springer New York NY USA 2007

[22] K Murphy and S Mian Modeling gene expression data usingdynamic Bayesian networks University of California BerkeleyCalif USA 2001

[23] H Liu D Yue L Zhang Y Chen S J Gao and Y Huang ldquoABayesian approach for identifying miRNA targets by combin-ing sequence prediction and gene expression profilingrdquo BMCGenomics vol 11 no 3 p S12 2010

[24] Y Huang J Wang J Zhang M Sanchez and Y WangldquoBayesian inference of genetic regulatory networks from timeseries microarray data using dynamic Bayesian networksrdquoJournal of Multimedia vol 2 no 3 pp 46ndash56 2007

[25] B-E Perrin L Ralaivola A Mazurie S Bottani J Mallet andF DrsquoAlche-Buc ldquoGene networks inference using dynamicBayesian networksrdquoBioinformatics vol 19 no 2 pp ii138ndashii1482003

[26] C Rangel D LWild F Falciani Z Ghahramani and A GaibaldquoAmodelling biological responses using gene expression profil-ing and linear dynamical systemsrdquo Bioinformatics pp 349ndash3562005

[27] MQuach N Brunel and F drsquoAlch Buc ldquoEstimating parametersand hidden variables in non-linear state-space models based onODEs for biological networks inferencerdquoBioinformatics vol 23no 23 pp 3209ndash3216 2007

[28] F-X Wu W-J Zhang and A J Kusalik ldquoModeling geneexpression from microarray expression data with state-space

Advances in Bioinformatics 11

equationsrdquo in Pacific Symposium on Biocomputing R B Alt-man A K Dunker L Hunter T A Jung and T E Klein Edspp 581ndash592 World Scientific River Edge NJ USA 2004

[29] R Yamaguchi S Yoshida S Imoto T Higuchi and S MiyanoldquoFinding module-based gene networks with state-spacemodelsmining high-dimensional and short time-course geneexpression datardquo IEEE Signal Processing Magazine vol 24 no1 pp 37ndash46 2007

[30] O Hirose R Yoshida S Imoto et al ldquoStatistical inferenceof transcriptional module-based gene networks from timecourse gene expression profiles by using state space modelsrdquoBioinformatics vol 24 no 7 pp 932ndash942 2008

[31] J Angus M Beal J Li C Rangel and D Wild ldquoInferringtranscriptional networks using prior biological knowledge andconstrained state-space modelsrdquo in Learning and Inference inComputational Systems Biology N Lawrence M GirolamiM Rattray and G Sanguinetti Eds pp 117ndash152 MIT PressCambridge UK 2010

[32] C Rangel J Angus Z Ghahramani et al ldquoModeling T-cell acti-vation using gene expression profiling and state-space modelsrdquoBioinformatics vol 20 no 9 pp 1361ndash1372 2004

[33] A Noor E SerpedinMN Nounou andHNNounou ldquoInfer-ring gene regulatory networks via nonlinear state-space modelsand exploiting sparsityrdquo IEEEACM Transactions on Computa-tional Biology and Bioinformatics vol 9 no 4 pp 1203ndash12112012

[34] Z Wang X Liu Y Liu J Liang and V Vinciotti ldquoAn extendedkalman filtering approach tomodeling nonlinear dynamic generegulatory networks via short gene expression time seriesrdquoIEEEACM Transactions on Computational Biology and Bioin-formatics vol 6 no 3 pp 410ndash419 2009

[35] A Noor E Serpedin M Nounou H Nounou N Mohamedand L Chouchane ldquoAn overview of the statistical methodsused for inferring gene regulatory networks and proteinproteininteraction networksrdquo Advances in Bioinformatics vol 2013Article ID 953814 12 pages 2013

[36] I Arasaratnam and S Haykin ldquoCubature kalman filtersrdquo IEEETransactions on Automatic Control vol 54 no 6 pp 1254ndash12692009

[37] A Noor E Serpedin M N Nounou and H N Nounou ldquoAcubature Kalman filter approach for inferring gene regulatorynetworks using time series datardquo in Proceedings of the IEEEInternationalWorkshop onGenomic Signal Processing and Statis-tics (GENSIPS rsquo11) pp 25ndash28 2011

[38] A Carmi P Gurfil and D Kanevsky ldquoMethods for sparsesignal recovery using kalman filtering with embedded rseudo-measurement norms and quasi-normsrdquo IEEE Transactions onSignal Processing vol 58 no 4 pp 2405ndash2409 2010

[39] C A Penfold andD LWild ldquoHow to infer gene networks fromexpression profiles revisitedrdquo Interface Focus pp 857ndash870 2011

[40] I Cantone L Marucci F Iorio et al ldquoA yeast synthetic networkfor in vivo assessment of reverse-engineering and modelingapproachesrdquo Cell vol 137 no 1 pp 172ndash181 2009

[41] Y Huang I M Tienda-Luna and Y Wang ldquoReverse engineer-ing gene regulatory networks a survey of statistical modelsrdquoIEEE Signal Processing Magazine vol 26 no 1 pp 76ndash97 2009

[42] Z Wang F Yang D W C Ho S Swift A Tucker and X LiuldquoStochastic dynamic modeling of short gene expression time-series datardquo IEEE Transactions on Nanobioscience vol 7 no 1pp 44ndash55 2008

[43] H Xiong and Y Choe ldquoStructural systems identification ofgenetic regulatory networksrdquo Bioinformatics vol 24 no 4 pp553ndash560 2008

[44] R Tibshirani ldquoRegression shrinkage and selection via the lassordquoJournal of the Royal Statistical Society B vol 58 no 1 pp 267ndash288 1996

[45] E J Cands andT Tao ldquoDecoding by linear programmingrdquo IEEETransactions on Information Theory vol 51 no 12 pp 4203ndash4215 2005

[46] J D Geeter H V Brussel and J D Schutter ldquoA smoothly con-strained Kalman filterrdquo IEEE Transactions on Pattern Analysisand Machine Intelligence vol 19 no 10 pp 1171ndash1177 1997

[47] S M Kay Fundamentals of Statistical Signal Processing Estima-tion Theory Prentice-Hall New York NY USA 1993

[48] httpwikic2b2columbiaedudream

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 5: Research Article Reverse Engineering Sparse Gene ...downloads.hindawi.com/journals/abi/2013/205763.pdf · compressed sensing-based Kalman lter is used for the estimation of system

Advances in Bioinformatics 5

parameter vector The standard predict and update stepsinvolved in Kalman filter are summarized as follows

b119896|119896minus1 = b119896minus1|119896minus1 + 120578119896

P119887119887119896|119896minus1 = P119887119887119896minus1|119896minus1 + Σ120578119896

u119896 = y119896 minus R119891119896b119896

K119866 = P119887119887119896|119896minus1R119879

119891119896(R119891119896P119887119887119896|119896minus1R

119879

119891119896+ 120590

2

119890Iminus1)

b119896|119896 = b119896|119896minus1 + K119866u119896

P119887119887119896|119896 = (I minus K119866R119891119896)P119887119887119896|119896minus1

(20)

whereK119866 denotes the Kalman gain andP represents the errorcovariance matrix

The Kalman filter algorithm is based on an 1198972-normminimization criterion As the gene networks are known tobe highly sparse the parameter vector is expected to haveonly a few nonzero values A more accurate approach forestimating such a vector would be to introduce an additionalconstraint on its 1198971-normwhich is the core idea in compressedsensing [38 44] Such an 1198971-norm constraint provides aunique solution to the underdetermined set of equations [45]Therefore instead of a simple 1198972 norm minimization thefollowing constrained optimization problem is considered

minb119896

10038171003817100381710038171003817

b119896 minus b11989610038171003817100381710038171003817

2

2st 1003817

1003817100381710038171003817

b11989610038171003817100381710038171003817

le 120598 (21)

The importance of this constraint can be judged by the factthat without it the system would be rendered unidentifiable[43]

The problem (21) can be solved using a pseudomeasure-ment (PM) method which incorporates the inequality con-straint (21) in the filtering process by assuming an artificialmeasurement b1198961 minus 120598 = 0 This is concisely expressed as

0 = Rb119896 minus 120598 R120591 = [sign (b120591 (1)) sign (

b120591 (119873))]

(22)

The value of the covariance matrix Σ120598 = 120590

2

120598I of the pseudo-

noise 120598 is selected in a similar manner as the process noisecovariance in the EKF algorithm However it is found thatlarge values of variances that is 1205902

120598ge 100 prove sufficient in

most cases [38] Further details on selecting these parameterscan be found in [38 46] The PM method solves (21) in arecursive manner for119870120591 iterations using the following steps

K120591119866= P120591R

119879

120591(R120591P120591R

119879

120591+ Σ120598)

minus1

b120591+1 = (I minus K120591119866R120591) b120591

P120591+1 = (I minus K120591119866R120591)P120591

(23)

At each 119896th time instant P119887119887119896|119896 and b119896|119896 obtained from(20) are considered as initial values that is b1 =

b119896|119896 andP1 = P119887119887119896|119896 which is the error covariancematrixThe value of

(1) Input time series data set y(2) Initialize 119868 119870 1206010 x0(3) for 119896 = 1 119870 do(4) Find the state estimates using CKF following the time

and measurement update steps in Section 3(5) Estimate parameters b119896 from x119896 and y119896 using (20)(6) for 120591 = 1 119870120591 do(7) Update the parameters b119896 using (23)(8) end for(9) end for(10) return

Algorithm 1 Network inference CKFS

119870120591 is equal to the number of constraints that is the expectednumber of nonzero b119898119899s in the system Possible ways forcalculating 119870120591 include minimum description length (MDL)principle and Bayesian information criterion (BIC)

33 Inference Algorithm The network inference algorithmis summarized in Algorithm 1 The algorithm consists of arecursive processwhich repeats itself for the number of obser-vations present in the time-series data For each time samplethe state estimate is obtained using the CKF and the parame-ter estimate is obtained using the KF Since the parameters areexpected to be sparse the estimates are then refined furtherusing the CSKF algorithm This iterative process results ina simple and accurate algorithm for gene network inferencewhile considering a complex nonlinear model

4 Crameacuter-Rao Bound

Theperformance of an estimator can be judged by comparingit with theoretical lower bounds proposed in parameterestimation theoryThe CRB establishes a lower bound on theMSE of an unbiased estimator [47] In particular the CRBstates that the covariance matrix of the estimator b is lowerbounded by

E [(b minus b) (b minus b)

119879

] ⪰ [I (b)]minus1 (24)

where the matrix inequality ⪰ is to be interpreted in thesemidefinite sense and I(b) is the Fisher information matrix(FIM)

I (b) = E[(

120597 ln119891 (y | b)120597b

)(

120597 ln119891 (y | b)120597b

)

119879

] (25)

The CRB for gene network inference can be calculated asfollows By stacking all the observations for 119896 = 1 119870 (7)can be written compactly in the matrix form

y = Rb + e (26)

6 Advances in Bioinformatics

where y = [y1198791 y119879

119870]

119879 R = [R1198791 R119879

119870]

119879 and e = [e1198791

e119879119870]

119879 The PDF 119901(y | b) is expressed as

119901 (y | b) = 119862 exp(minus

(y minus Rb)119879 (y minus Rb)2120590

2119890

) (27)

where 119862 is a constant The derivative of ln119901(y | b) can beexpressed as

120597 ln119901 (y | b)120597b

= minus

120597

120597b[

(y minus Rb)119879 (y minus Rb)120590

2119890

]

=

R119879y minus R119879Rb120590

2119890

(28)

It now follows that

(

120597 ln119901 (y | b)120597b

)(

120597 ln119901 (y | b)120597b

)

119879

=

R119879 (y minus Rb) (y minus Rb)119879R120590

4119890

(29)

By taking the expectation of (29) the FIM in (25) is given by

I (b) =

R119879R120590

2119890

(30)

The inverse of the FIM in (30) can be used to place a lowerbound on the estimation error of the parameter vector bFigure 2 shows the comparison of MSE of CKFS algorithmwith CRB as a function of number of samples 119870 for onerepresentative gene from the eight-gene network consideredin Section 51 It is observed that the MSE of the estimatedparameters decreases with increasing number of samples

5 Results and Discussion

The simulation results of the CKFS algorithm are discussed inthis section The performance is first tested on synthetic dataobtained from randomly generated Boolean networks undervarious scenarios and performance metrics The algorithmis then assessed on the DREAM4 networks and the IRMAnetwork

51 Synthetic Data Time-series data is produced from ran-domly generated Boolean networks using the system model(3) and (5) Two scenarios are considered for this purpose

First the comparison is performed by varying the num-ber of sample size while keeping the network size fixed Thegene network consists of 8 genes and 20 vertices In termsof network estimation if the algorithm predicts an edgebetween two nodes which may not be present in reality anerror referred to as false alarm error (F) is said to haveoccurred Another situation is the indication of the absenceof a vertex in the graph which in fact is present in the realnetwork This kind of error is termed missed detection (M)The summation of these two errors normalized over the total

4 6 8 10 12 14 16Sample size

CRBCKFS

10minus7

10minus6

10minus5

10minus4

10minus3

10minus2

10minus1

100

log10

MSE

Figure 2 Cramer-Rao bound on the estimation of parameters TheMSE for one of the representative 120579 is shown here for a networkconsisting of 8 vertices

number of vertices in the network yields the Hammingdistance It is also important to consider the probabilityof predicting the true connections correctly which will beassessed by the true connections (T) metric An algorithmwith low Hamming distance and small false alarm erroris particularly desirable as predicting an edge erroneouslycan be troublesome for biologists True connections indicatethe reliability of the predictions Figure 3 illustrates theperformance of the CKFS algorithm and that of the EKFalgorithm proposed in [34] in terms of the metrics describedabove It is important to mention here that the same systemmodel is assumed by both CKFS and EKF algorithms forthe purpose of this simulation These metrics are the sameas those used in [15] The variances of both the system andmeasurement noises 1205902

119908and 120590

2

V respectively are taken to be10

minus5 in all the simulations and are assumed to be knownIt is noticed that EKF has a slightly lower false alarm ratewhen the number of samples is small however as thenumber of samples increases CKFS yields a lower false alarmerror The Hamming distance for CKFS is also smaller thanEKF indicating lesser cumulative error True connectionsshow a consistent behavior for the two algorithms when thenumber of samples is increased where CKFS is able to predictconnections more accurately These experiments show thesuperiority of CKFS in terms of lower error rate

To obtain a more rigorous evaluation the performance ofalgorithms is then compared in a scenario which considersthe sample size to be fixed and assumes networks of differentsizes The receiver operating characteristic (ROC) curves areplotted as performance measures A higher area under theROC curve (AUROC) shows more true positives for a givenfalse positive and therefore indicates better classificationThe performance of CKFS(119873 119864119870) and EKF(119873 119864119870) isshown in Figure 4 where 119873 stands for the number of nodes

Advances in Bioinformatics 7

5 10 15 20 25 30 35 4002

022

024

026

028

03

032

034

036

038

Sample size

False

alar

m er

rors

EKFCKFS

(a)

035

04

045

05

055

06

065

07

Ham

min

g di

stanc

e

5 10 15 20 25 30 35 40Sample size

EKFCKFS

(b)

5 10 15 20 25 30 35 40045

05

055

06

065

07

075

08

Sample size

True

conn

ectio

ns

EKFCKFS

(c)

Figure 3 (a) (b) and (c) False alarm errors Hamming distance and true connections The synthetic networks consist of 8 vertices and 20edges The metric is normalized over the number of edges CKFS gives lower error and predicts more true connections with the increase inthe sample size of data

119864 represents the number of edges and 119870 denotes the timepoints It is observed that the CKFS exhibits superior perfor-mance than the EKF for networks of different sizes

The complexity of the two algorithms is compared forsynthetically generated networks with number of genes equalto 10 20 30 and 40The sample size is kept to 50 time pointsfor each of these networks and the run time for EKF andCKFS algorithms is calculated as shown in Table 1 It is notedthat EKF is faster for smaller network sizes but as the networksize increases the run time gets much larger than that forCKFS The main reason for this is that EKF [34] estimatesthe states and parameters by stacking them together whichrequires large-sized matrix multiplications at each iteration

The benefit associated with performing dual estimation asin CKFS is that the parameters are estimated separatelyfrom the states Since the system is linear and one-to-onefor parameters inversion of much smaller matrices can beperformed reducing the computational complexity of CKFSalgorithm CKFS is therefore particularly attractive for large-sized networks

52 DREAM4Gene Networks Several in silico networks havebeen produced in order to benchmark the performanceof gene network inference algorithms dialogue on reverseengineering assessment and methods (DREAM) in siliconetworks serve as one of the popular methods used for this

8 Advances in Bioinformatics

0 02 04 06 08 10

01

02

03

04

05

06

07

08

09

1

False positive rate

True

pos

itive

rate

ROC curve

CKFSEKF

(a)

0 02 04 06 08 10

01

02

03

04

05

06

07

08

09

1

False positive rate

True

pos

itive

rate

ROC curve

CKFSEKF

(b)

0 02 04 06 08 10

01

02

03

04

05

06

07

08

09

1

False positive rate

True

pos

itive

rate

ROC curve

CKFSEKF

(c)

Figure 4 ROC curves for the performance of CKFS and EKF using synthetic data (119873119864119870) (a) (b) and (c) (5 10 20) (10 12 20) and(15 19 20) The area under the ROC curve for CKFS is more than that for EKF for various sized networks

Table 1 Run time in seconds for EKF and CKFS algorithms forvarying network sizes for synthetically generated data The numberof sample points is fixed to 50

Number of genes 10 20 30 40EKF 016 19 165 84CKFS 12 43 115 241

purpose [39 48] In this section the performance of theCKFS algorithm is evaluated using the 10-gene and 100-gene networks released online by the DREAM4 challenge

Five networks are produced using the known GRNs ofEscherichia coli and Saccharomyces cerevisiae The data setsfor each of 10-gene network consists of 21 data points forfive different perturbations The inference is performed byusing all the perturbations The 100-gene network consistsof data sets for ten perturbations AUROC and area underthe precision-recall curve (AUPR) are calculated for the fivenetworks of both the data sets and shown in Tables 2 and 3respectively The quantities precision and recall are definedas 119875 = 119879(119879 + 119865) and 119877 = 119879(119879 + 119872) respectively Forcomparison purposes the values of the two quantities fortime-series network identification (TSNI) algorithm that

Advances in Bioinformatics 9

GAL80 GAL4

SW15 CBF1

ASH1

(a)

GAL80 GAL4

SW15 CBF1

ASH1

(b)

GAL80 GAL4

SW15 CBF1

ASH1

(c)

Figure 5 The inferred IRMA networks (a) (b) and (c) Gold standard inferred network using CKFS and inferred network using ODE[39 40] Black arrows indicate true connections blue arrows indicate the edges that are correct but their directions are reversed and redarrows indicate false positives

Table 2 Area under the ROC curve (AUROC) and area under the PR curve (AUPR) for DREAM4 10-gene networks for the five differentnetworks

Algorithm Network 1 Network 2 Network 3 Network 4 Network 5ODE [39] 062 (027) 063 (032) 058 (021) 063 (023) 068 (025)CKFS 063 (040) 067 (050) 072 (050) 075 (049) 081 (042)Random [39] 055 (018) 055 (019) 055 (017) 057 (017) 056 (016)

Table 3 Area under the ROC curve (AUROC) and area under the PR curve (AUPR) for DREAM4 100-gene networks for the five differentnetworks

Algorithm Network 1 Network 2 Network 3 Network 4 Network 5ODE [39] 055 (002) 055 (003) 060 (003) 054 (002) 059 (003)CKFS 067 (013) 057 (008) 060 (010) 062 (010) 060 (007)Random [39] 050 (0002) 050 (0002) 050 (0002) 050 (0002) 050 (0002)

exploits ordinary differential equations are also given [39]The CKFS algorithm is found to perform significantly betterthan the TSNI algorithm

53 IRMA Gene Network In addition to synthetic data it isimperative to test the algorithms using real biological dataIn this subsection the performance of the CKFS algorithm isassessed using the in vivo reverse-engineering and modelingassessment (IRMA) network [40] This network consists offive genes Galactose activates the gene expression in thenetwork whereas glucose deactivates it The cells are grownin the presence of galactose and then switched to glucoseto obtain the switch-off data which represents the expressivesamples at 21 time points The switch-on data consists of 16sample points and is obtained by growing the cells in a glucosemedium and then changing to galactose The system andmeasurement noise variances for the CKFS are assumed tobe identical as in the previous simulations Figure 5 shows theinferred network the gold standard and the network inferredusing TSNI It is observed that the CKFS algorithm succeeds

to predict most of the interactions while giving lower falsepositives

6 Conclusions

This paper presents a novel algorithm for inferring generegulatory networks from time-series data Gene regulationis assumed to follow a nonlinear state evolution model Theparameters of the system which indicate the inhibitory orexcitatory relationships between the genes are estimatedusing compressed sensing-based Kalman filtering The spar-sity constraint on the parameters is crucial because the genesare known to interact with few other genes only The use ofCKF and the dual estimation of states and parameters rendersthe algorithm computationally efficient The performance ofCKFS is evaluated for synthetic data for different networksizes as well as varying sample points ROC curves Hammingdistance and true positives are used for comparing theaccuracy of inferred network with EKF It is observed thatCKFS outperforms the EKF algorithm In addition CKFS

10 Advances in Bioinformatics

gives advantages over EKF in terms of smaller run time forlarge networks The Cramer-Rao lower bound is also deter-mined for the parameters of the model and compared withthe MSE performance of the proposed algorithm Assess-ment using DREAM4 10-gene and 100-gene networks andIRMA network data corroborates the superior performanceof CKFS Future research directions include incorporatingthe estimation of model order in the network inferencealgorithm

Acknowledgments

This work was supported by USNational Science Foundation(NSF) Grant 0915444 and QNRF-NPRP Grant 09-874-3-235 The material in this paper was presented in part at theIEEE InternationalWorkshop on Genomic Signal Processingand Statistics (GENSIPS) San Antonio TX USA December2011

References

[1] X Zhou X Wang and E R Dougherty Genomic NetworksStatistical Inference from Microarray Data John Wiley amp SonsNew York NY USA 2006

[2] H Kitano ldquoComputational systems biologyrdquo Nature vol 420pp 206ndash210 2002

[3] X Zhou and S T CWongComputational Systems Bioinformat-ics World Scientific River Edge NJ USA 2008

[4] X Cai and X Wang ldquoStochastic modeling and simulation ofgene networksrdquo IEEE Signal Processing Magazine vol 24 no 1pp 27ndash36 2007

[5] D Yue J Meng M Lu C L P Chen M Guo and Y HuangldquoUnderstanding micro-RNA regulation a computational per-spectiverdquo IEEE Signal ProcessingMagazine vol 29 no 1 pp 77ndash88 2012

[6] R Pal S Bhattacharya and M U Caglar ldquoRobust approachesfor genetic regulatory network modeling and intervention areview of recent advancesrdquo IEEE Signal Processing Magazinevol 29 no 1 pp 66ndash76 2012

[7] H Hache H Lehrach and R Herwig ldquoReverse engineering ofgene regulatory networks a comparative studyrdquo Eurasip Journalon Bioinformatics and Systems Biology vol 2009 Article ID617281 2009

[8] T Schlitt and A Brazma ldquoCurrent approaches to gene regula-tory networkmodellingrdquo BMC Bioinformatics vol 8 no 6 p 92007

[9] H D Jong ldquoModeling and simulation of genetic regulatoy sys-tems a literature reviewrdquo Journal of Computational Biology vol9 no 1 pp 67ndash103 2002

[10] I Nachman A Regev and N Friedman ldquoInferring quantitativemodels of regulatory networks from expression datardquo Bioinfor-matics vol 20 no 1 pp i248ndashi256 2004

[11] C D Giurcaneanu I Tabus and J Astola ldquoClustering timeseries gene expression data based on sum-of-exponentials fit-tingrdquo EURASIP Journal on Advances in Signal Processing vol2005 no 8 Article ID 358568 pp 1159ndash1173 2005

[12] C D Giurcaneanu I Tabus J Astola J Ollila and M VihinenldquoFast iterative gene clustering based on information theoreticcriteria for selecting the cluster structurerdquo Journal of Computa-tional Biology vol 11 no 4 pp 660ndash682 2004

[13] X Cai andG B Giannakis ldquoIdentifying differentially expressedgenes in microarray experiments with model-based varianceestimationrdquo IEEE Transactions on Signal Processing vol 54 no6 pp 2418ndash2426 2006

[14] X Zhou XWang and E R Dougherty ldquoGene clustering basedon cluster-wide mutual informationrdquo Journal of ComputationalBiology vol 11 no 1 pp 151ndash165 2004

[15] W Zhao E Serpedin and E R Dougherty ldquoInferring connec-tivity of genetic regulatory networks using informationtheoreticcriteriardquo IEEEACMTransactions onComputational Biology andBioinformatics vol 5 no 2 pp 262ndash274 2008

[16] J Dougherty I Tabus and J Astola ldquoInference of gene reg-ulatory networks based on a universal minimum descriptionlengthrdquo Eurasip Journal on Bioinformatics and Systems Biologyvol 2008 Article ID 482090 2008

[17] L Qian H Wang and E R Dougherty ldquoInference of noisynonlinear differential equation models for gene regulatory net-works using genetic programming and Kalman filteringrdquo IEEETransactions on Signal Processing vol 56 no 7 pp 3327ndash33392008

[18] W Zhao E Serpedin and E R Dougherty ldquoInferring gene reg-ulatory networks from time series data using the minimumdescription length principlerdquo Bioinformatics vol 22 no 17 pp2129ndash2135 2006

[19] X Zhou X Wang R Pal I Ivanov M Bittner and E RDougherty ldquoA Bayesian connectivity-based approach to con-structing probabilistic gene regulatory networksrdquo Bioinformat-ics vol 20 no 17 pp 2918ndash2927 2004

[20] J Meng M Lu Y Chen S-J Gao and Y Huang ldquoRobust infer-ence of the context specific structure and temporal dynamics ofgene regulatory networkrdquo BMC Genomics vol 11 no 3 p S112010

[21] Y Zhang Z Deng H Jiang and P Jia ldquoInferring gene reg-ulatory networks from multiple data sources via a dynamicBayesian network with structural emrdquo in DILS S C Boulakiaand V Tannen Eds vol 4544 of Lecture Notes in ComputerScience pp 204ndash214 Springer New York NY USA 2007

[22] K Murphy and S Mian Modeling gene expression data usingdynamic Bayesian networks University of California BerkeleyCalif USA 2001

[23] H Liu D Yue L Zhang Y Chen S J Gao and Y Huang ldquoABayesian approach for identifying miRNA targets by combin-ing sequence prediction and gene expression profilingrdquo BMCGenomics vol 11 no 3 p S12 2010

[24] Y Huang J Wang J Zhang M Sanchez and Y WangldquoBayesian inference of genetic regulatory networks from timeseries microarray data using dynamic Bayesian networksrdquoJournal of Multimedia vol 2 no 3 pp 46ndash56 2007

[25] B-E Perrin L Ralaivola A Mazurie S Bottani J Mallet andF DrsquoAlche-Buc ldquoGene networks inference using dynamicBayesian networksrdquoBioinformatics vol 19 no 2 pp ii138ndashii1482003

[26] C Rangel D LWild F Falciani Z Ghahramani and A GaibaldquoAmodelling biological responses using gene expression profil-ing and linear dynamical systemsrdquo Bioinformatics pp 349ndash3562005

[27] MQuach N Brunel and F drsquoAlch Buc ldquoEstimating parametersand hidden variables in non-linear state-space models based onODEs for biological networks inferencerdquoBioinformatics vol 23no 23 pp 3209ndash3216 2007

[28] F-X Wu W-J Zhang and A J Kusalik ldquoModeling geneexpression from microarray expression data with state-space

Advances in Bioinformatics 11

equationsrdquo in Pacific Symposium on Biocomputing R B Alt-man A K Dunker L Hunter T A Jung and T E Klein Edspp 581ndash592 World Scientific River Edge NJ USA 2004

[29] R Yamaguchi S Yoshida S Imoto T Higuchi and S MiyanoldquoFinding module-based gene networks with state-spacemodelsmining high-dimensional and short time-course geneexpression datardquo IEEE Signal Processing Magazine vol 24 no1 pp 37ndash46 2007

[30] O Hirose R Yoshida S Imoto et al ldquoStatistical inferenceof transcriptional module-based gene networks from timecourse gene expression profiles by using state space modelsrdquoBioinformatics vol 24 no 7 pp 932ndash942 2008

[31] J Angus M Beal J Li C Rangel and D Wild ldquoInferringtranscriptional networks using prior biological knowledge andconstrained state-space modelsrdquo in Learning and Inference inComputational Systems Biology N Lawrence M GirolamiM Rattray and G Sanguinetti Eds pp 117ndash152 MIT PressCambridge UK 2010

[32] C Rangel J Angus Z Ghahramani et al ldquoModeling T-cell acti-vation using gene expression profiling and state-space modelsrdquoBioinformatics vol 20 no 9 pp 1361ndash1372 2004

[33] A Noor E SerpedinMN Nounou andHNNounou ldquoInfer-ring gene regulatory networks via nonlinear state-space modelsand exploiting sparsityrdquo IEEEACM Transactions on Computa-tional Biology and Bioinformatics vol 9 no 4 pp 1203ndash12112012

[34] Z Wang X Liu Y Liu J Liang and V Vinciotti ldquoAn extendedkalman filtering approach tomodeling nonlinear dynamic generegulatory networks via short gene expression time seriesrdquoIEEEACM Transactions on Computational Biology and Bioin-formatics vol 6 no 3 pp 410ndash419 2009

[35] A Noor E Serpedin M Nounou H Nounou N Mohamedand L Chouchane ldquoAn overview of the statistical methodsused for inferring gene regulatory networks and proteinproteininteraction networksrdquo Advances in Bioinformatics vol 2013Article ID 953814 12 pages 2013

[36] I Arasaratnam and S Haykin ldquoCubature kalman filtersrdquo IEEETransactions on Automatic Control vol 54 no 6 pp 1254ndash12692009

[37] A Noor E Serpedin M N Nounou and H N Nounou ldquoAcubature Kalman filter approach for inferring gene regulatorynetworks using time series datardquo in Proceedings of the IEEEInternationalWorkshop onGenomic Signal Processing and Statis-tics (GENSIPS rsquo11) pp 25ndash28 2011

[38] A Carmi P Gurfil and D Kanevsky ldquoMethods for sparsesignal recovery using kalman filtering with embedded rseudo-measurement norms and quasi-normsrdquo IEEE Transactions onSignal Processing vol 58 no 4 pp 2405ndash2409 2010

[39] C A Penfold andD LWild ldquoHow to infer gene networks fromexpression profiles revisitedrdquo Interface Focus pp 857ndash870 2011

[40] I Cantone L Marucci F Iorio et al ldquoA yeast synthetic networkfor in vivo assessment of reverse-engineering and modelingapproachesrdquo Cell vol 137 no 1 pp 172ndash181 2009

[41] Y Huang I M Tienda-Luna and Y Wang ldquoReverse engineer-ing gene regulatory networks a survey of statistical modelsrdquoIEEE Signal Processing Magazine vol 26 no 1 pp 76ndash97 2009

[42] Z Wang F Yang D W C Ho S Swift A Tucker and X LiuldquoStochastic dynamic modeling of short gene expression time-series datardquo IEEE Transactions on Nanobioscience vol 7 no 1pp 44ndash55 2008

[43] H Xiong and Y Choe ldquoStructural systems identification ofgenetic regulatory networksrdquo Bioinformatics vol 24 no 4 pp553ndash560 2008

[44] R Tibshirani ldquoRegression shrinkage and selection via the lassordquoJournal of the Royal Statistical Society B vol 58 no 1 pp 267ndash288 1996

[45] E J Cands andT Tao ldquoDecoding by linear programmingrdquo IEEETransactions on Information Theory vol 51 no 12 pp 4203ndash4215 2005

[46] J D Geeter H V Brussel and J D Schutter ldquoA smoothly con-strained Kalman filterrdquo IEEE Transactions on Pattern Analysisand Machine Intelligence vol 19 no 10 pp 1171ndash1177 1997

[47] S M Kay Fundamentals of Statistical Signal Processing Estima-tion Theory Prentice-Hall New York NY USA 1993

[48] httpwikic2b2columbiaedudream

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 6: Research Article Reverse Engineering Sparse Gene ...downloads.hindawi.com/journals/abi/2013/205763.pdf · compressed sensing-based Kalman lter is used for the estimation of system

6 Advances in Bioinformatics

where y = [y1198791 y119879

119870]

119879 R = [R1198791 R119879

119870]

119879 and e = [e1198791

e119879119870]

119879 The PDF 119901(y | b) is expressed as

119901 (y | b) = 119862 exp(minus

(y minus Rb)119879 (y minus Rb)2120590

2119890

) (27)

where 119862 is a constant The derivative of ln119901(y | b) can beexpressed as

120597 ln119901 (y | b)120597b

= minus

120597

120597b[

(y minus Rb)119879 (y minus Rb)120590

2119890

]

=

R119879y minus R119879Rb120590

2119890

(28)

It now follows that

(

120597 ln119901 (y | b)120597b

)(

120597 ln119901 (y | b)120597b

)

119879

=

R119879 (y minus Rb) (y minus Rb)119879R120590

4119890

(29)

By taking the expectation of (29) the FIM in (25) is given by

I (b) =

R119879R120590

2119890

(30)

The inverse of the FIM in (30) can be used to place a lowerbound on the estimation error of the parameter vector bFigure 2 shows the comparison of MSE of CKFS algorithmwith CRB as a function of number of samples 119870 for onerepresentative gene from the eight-gene network consideredin Section 51 It is observed that the MSE of the estimatedparameters decreases with increasing number of samples

5 Results and Discussion

The simulation results of the CKFS algorithm are discussed inthis section The performance is first tested on synthetic dataobtained from randomly generated Boolean networks undervarious scenarios and performance metrics The algorithmis then assessed on the DREAM4 networks and the IRMAnetwork

51 Synthetic Data Time-series data is produced from ran-domly generated Boolean networks using the system model(3) and (5) Two scenarios are considered for this purpose

First the comparison is performed by varying the num-ber of sample size while keeping the network size fixed Thegene network consists of 8 genes and 20 vertices In termsof network estimation if the algorithm predicts an edgebetween two nodes which may not be present in reality anerror referred to as false alarm error (F) is said to haveoccurred Another situation is the indication of the absenceof a vertex in the graph which in fact is present in the realnetwork This kind of error is termed missed detection (M)The summation of these two errors normalized over the total

4 6 8 10 12 14 16Sample size

CRBCKFS

10minus7

10minus6

10minus5

10minus4

10minus3

10minus2

10minus1

100

log10

MSE

Figure 2 Cramer-Rao bound on the estimation of parameters TheMSE for one of the representative 120579 is shown here for a networkconsisting of 8 vertices

number of vertices in the network yields the Hammingdistance It is also important to consider the probabilityof predicting the true connections correctly which will beassessed by the true connections (T) metric An algorithmwith low Hamming distance and small false alarm erroris particularly desirable as predicting an edge erroneouslycan be troublesome for biologists True connections indicatethe reliability of the predictions Figure 3 illustrates theperformance of the CKFS algorithm and that of the EKFalgorithm proposed in [34] in terms of the metrics describedabove It is important to mention here that the same systemmodel is assumed by both CKFS and EKF algorithms forthe purpose of this simulation These metrics are the sameas those used in [15] The variances of both the system andmeasurement noises 1205902

119908and 120590

2

V respectively are taken to be10

minus5 in all the simulations and are assumed to be knownIt is noticed that EKF has a slightly lower false alarm ratewhen the number of samples is small however as thenumber of samples increases CKFS yields a lower false alarmerror The Hamming distance for CKFS is also smaller thanEKF indicating lesser cumulative error True connectionsshow a consistent behavior for the two algorithms when thenumber of samples is increased where CKFS is able to predictconnections more accurately These experiments show thesuperiority of CKFS in terms of lower error rate

To obtain a more rigorous evaluation the performance ofalgorithms is then compared in a scenario which considersthe sample size to be fixed and assumes networks of differentsizes The receiver operating characteristic (ROC) curves areplotted as performance measures A higher area under theROC curve (AUROC) shows more true positives for a givenfalse positive and therefore indicates better classificationThe performance of CKFS(119873 119864119870) and EKF(119873 119864119870) isshown in Figure 4 where 119873 stands for the number of nodes

Advances in Bioinformatics 7

5 10 15 20 25 30 35 4002

022

024

026

028

03

032

034

036

038

Sample size

False

alar

m er

rors

EKFCKFS

(a)

035

04

045

05

055

06

065

07

Ham

min

g di

stanc

e

5 10 15 20 25 30 35 40Sample size

EKFCKFS

(b)

5 10 15 20 25 30 35 40045

05

055

06

065

07

075

08

Sample size

True

conn

ectio

ns

EKFCKFS

(c)

Figure 3 (a) (b) and (c) False alarm errors Hamming distance and true connections The synthetic networks consist of 8 vertices and 20edges The metric is normalized over the number of edges CKFS gives lower error and predicts more true connections with the increase inthe sample size of data

119864 represents the number of edges and 119870 denotes the timepoints It is observed that the CKFS exhibits superior perfor-mance than the EKF for networks of different sizes

The complexity of the two algorithms is compared forsynthetically generated networks with number of genes equalto 10 20 30 and 40The sample size is kept to 50 time pointsfor each of these networks and the run time for EKF andCKFS algorithms is calculated as shown in Table 1 It is notedthat EKF is faster for smaller network sizes but as the networksize increases the run time gets much larger than that forCKFS The main reason for this is that EKF [34] estimatesthe states and parameters by stacking them together whichrequires large-sized matrix multiplications at each iteration

The benefit associated with performing dual estimation asin CKFS is that the parameters are estimated separatelyfrom the states Since the system is linear and one-to-onefor parameters inversion of much smaller matrices can beperformed reducing the computational complexity of CKFSalgorithm CKFS is therefore particularly attractive for large-sized networks

52 DREAM4Gene Networks Several in silico networks havebeen produced in order to benchmark the performanceof gene network inference algorithms dialogue on reverseengineering assessment and methods (DREAM) in siliconetworks serve as one of the popular methods used for this

8 Advances in Bioinformatics

0 02 04 06 08 10

01

02

03

04

05

06

07

08

09

1

False positive rate

True

pos

itive

rate

ROC curve

CKFSEKF

(a)

0 02 04 06 08 10

01

02

03

04

05

06

07

08

09

1

False positive rate

True

pos

itive

rate

ROC curve

CKFSEKF

(b)

0 02 04 06 08 10

01

02

03

04

05

06

07

08

09

1

False positive rate

True

pos

itive

rate

ROC curve

CKFSEKF

(c)

Figure 4 ROC curves for the performance of CKFS and EKF using synthetic data (119873119864119870) (a) (b) and (c) (5 10 20) (10 12 20) and(15 19 20) The area under the ROC curve for CKFS is more than that for EKF for various sized networks

Table 1 Run time in seconds for EKF and CKFS algorithms forvarying network sizes for synthetically generated data The numberof sample points is fixed to 50

Number of genes 10 20 30 40EKF 016 19 165 84CKFS 12 43 115 241

purpose [39 48] In this section the performance of theCKFS algorithm is evaluated using the 10-gene and 100-gene networks released online by the DREAM4 challenge

Five networks are produced using the known GRNs ofEscherichia coli and Saccharomyces cerevisiae The data setsfor each of 10-gene network consists of 21 data points forfive different perturbations The inference is performed byusing all the perturbations The 100-gene network consistsof data sets for ten perturbations AUROC and area underthe precision-recall curve (AUPR) are calculated for the fivenetworks of both the data sets and shown in Tables 2 and 3respectively The quantities precision and recall are definedas 119875 = 119879(119879 + 119865) and 119877 = 119879(119879 + 119872) respectively Forcomparison purposes the values of the two quantities fortime-series network identification (TSNI) algorithm that

Advances in Bioinformatics 9

GAL80 GAL4

SW15 CBF1

ASH1

(a)

GAL80 GAL4

SW15 CBF1

ASH1

(b)

GAL80 GAL4

SW15 CBF1

ASH1

(c)

Figure 5 The inferred IRMA networks (a) (b) and (c) Gold standard inferred network using CKFS and inferred network using ODE[39 40] Black arrows indicate true connections blue arrows indicate the edges that are correct but their directions are reversed and redarrows indicate false positives

Table 2 Area under the ROC curve (AUROC) and area under the PR curve (AUPR) for DREAM4 10-gene networks for the five differentnetworks

Algorithm Network 1 Network 2 Network 3 Network 4 Network 5ODE [39] 062 (027) 063 (032) 058 (021) 063 (023) 068 (025)CKFS 063 (040) 067 (050) 072 (050) 075 (049) 081 (042)Random [39] 055 (018) 055 (019) 055 (017) 057 (017) 056 (016)

Table 3 Area under the ROC curve (AUROC) and area under the PR curve (AUPR) for DREAM4 100-gene networks for the five differentnetworks

Algorithm Network 1 Network 2 Network 3 Network 4 Network 5ODE [39] 055 (002) 055 (003) 060 (003) 054 (002) 059 (003)CKFS 067 (013) 057 (008) 060 (010) 062 (010) 060 (007)Random [39] 050 (0002) 050 (0002) 050 (0002) 050 (0002) 050 (0002)

exploits ordinary differential equations are also given [39]The CKFS algorithm is found to perform significantly betterthan the TSNI algorithm

53 IRMA Gene Network In addition to synthetic data it isimperative to test the algorithms using real biological dataIn this subsection the performance of the CKFS algorithm isassessed using the in vivo reverse-engineering and modelingassessment (IRMA) network [40] This network consists offive genes Galactose activates the gene expression in thenetwork whereas glucose deactivates it The cells are grownin the presence of galactose and then switched to glucoseto obtain the switch-off data which represents the expressivesamples at 21 time points The switch-on data consists of 16sample points and is obtained by growing the cells in a glucosemedium and then changing to galactose The system andmeasurement noise variances for the CKFS are assumed tobe identical as in the previous simulations Figure 5 shows theinferred network the gold standard and the network inferredusing TSNI It is observed that the CKFS algorithm succeeds

to predict most of the interactions while giving lower falsepositives

6 Conclusions

This paper presents a novel algorithm for inferring generegulatory networks from time-series data Gene regulationis assumed to follow a nonlinear state evolution model Theparameters of the system which indicate the inhibitory orexcitatory relationships between the genes are estimatedusing compressed sensing-based Kalman filtering The spar-sity constraint on the parameters is crucial because the genesare known to interact with few other genes only The use ofCKF and the dual estimation of states and parameters rendersthe algorithm computationally efficient The performance ofCKFS is evaluated for synthetic data for different networksizes as well as varying sample points ROC curves Hammingdistance and true positives are used for comparing theaccuracy of inferred network with EKF It is observed thatCKFS outperforms the EKF algorithm In addition CKFS

10 Advances in Bioinformatics

gives advantages over EKF in terms of smaller run time forlarge networks The Cramer-Rao lower bound is also deter-mined for the parameters of the model and compared withthe MSE performance of the proposed algorithm Assess-ment using DREAM4 10-gene and 100-gene networks andIRMA network data corroborates the superior performanceof CKFS Future research directions include incorporatingthe estimation of model order in the network inferencealgorithm

Acknowledgments

This work was supported by USNational Science Foundation(NSF) Grant 0915444 and QNRF-NPRP Grant 09-874-3-235 The material in this paper was presented in part at theIEEE InternationalWorkshop on Genomic Signal Processingand Statistics (GENSIPS) San Antonio TX USA December2011

References

[1] X Zhou X Wang and E R Dougherty Genomic NetworksStatistical Inference from Microarray Data John Wiley amp SonsNew York NY USA 2006

[2] H Kitano ldquoComputational systems biologyrdquo Nature vol 420pp 206ndash210 2002

[3] X Zhou and S T CWongComputational Systems Bioinformat-ics World Scientific River Edge NJ USA 2008

[4] X Cai and X Wang ldquoStochastic modeling and simulation ofgene networksrdquo IEEE Signal Processing Magazine vol 24 no 1pp 27ndash36 2007

[5] D Yue J Meng M Lu C L P Chen M Guo and Y HuangldquoUnderstanding micro-RNA regulation a computational per-spectiverdquo IEEE Signal ProcessingMagazine vol 29 no 1 pp 77ndash88 2012

[6] R Pal S Bhattacharya and M U Caglar ldquoRobust approachesfor genetic regulatory network modeling and intervention areview of recent advancesrdquo IEEE Signal Processing Magazinevol 29 no 1 pp 66ndash76 2012

[7] H Hache H Lehrach and R Herwig ldquoReverse engineering ofgene regulatory networks a comparative studyrdquo Eurasip Journalon Bioinformatics and Systems Biology vol 2009 Article ID617281 2009

[8] T Schlitt and A Brazma ldquoCurrent approaches to gene regula-tory networkmodellingrdquo BMC Bioinformatics vol 8 no 6 p 92007

[9] H D Jong ldquoModeling and simulation of genetic regulatoy sys-tems a literature reviewrdquo Journal of Computational Biology vol9 no 1 pp 67ndash103 2002

[10] I Nachman A Regev and N Friedman ldquoInferring quantitativemodels of regulatory networks from expression datardquo Bioinfor-matics vol 20 no 1 pp i248ndashi256 2004

[11] C D Giurcaneanu I Tabus and J Astola ldquoClustering timeseries gene expression data based on sum-of-exponentials fit-tingrdquo EURASIP Journal on Advances in Signal Processing vol2005 no 8 Article ID 358568 pp 1159ndash1173 2005

[12] C D Giurcaneanu I Tabus J Astola J Ollila and M VihinenldquoFast iterative gene clustering based on information theoreticcriteria for selecting the cluster structurerdquo Journal of Computa-tional Biology vol 11 no 4 pp 660ndash682 2004

[13] X Cai andG B Giannakis ldquoIdentifying differentially expressedgenes in microarray experiments with model-based varianceestimationrdquo IEEE Transactions on Signal Processing vol 54 no6 pp 2418ndash2426 2006

[14] X Zhou XWang and E R Dougherty ldquoGene clustering basedon cluster-wide mutual informationrdquo Journal of ComputationalBiology vol 11 no 1 pp 151ndash165 2004

[15] W Zhao E Serpedin and E R Dougherty ldquoInferring connec-tivity of genetic regulatory networks using informationtheoreticcriteriardquo IEEEACMTransactions onComputational Biology andBioinformatics vol 5 no 2 pp 262ndash274 2008

[16] J Dougherty I Tabus and J Astola ldquoInference of gene reg-ulatory networks based on a universal minimum descriptionlengthrdquo Eurasip Journal on Bioinformatics and Systems Biologyvol 2008 Article ID 482090 2008

[17] L Qian H Wang and E R Dougherty ldquoInference of noisynonlinear differential equation models for gene regulatory net-works using genetic programming and Kalman filteringrdquo IEEETransactions on Signal Processing vol 56 no 7 pp 3327ndash33392008

[18] W Zhao E Serpedin and E R Dougherty ldquoInferring gene reg-ulatory networks from time series data using the minimumdescription length principlerdquo Bioinformatics vol 22 no 17 pp2129ndash2135 2006

[19] X Zhou X Wang R Pal I Ivanov M Bittner and E RDougherty ldquoA Bayesian connectivity-based approach to con-structing probabilistic gene regulatory networksrdquo Bioinformat-ics vol 20 no 17 pp 2918ndash2927 2004

[20] J Meng M Lu Y Chen S-J Gao and Y Huang ldquoRobust infer-ence of the context specific structure and temporal dynamics ofgene regulatory networkrdquo BMC Genomics vol 11 no 3 p S112010

[21] Y Zhang Z Deng H Jiang and P Jia ldquoInferring gene reg-ulatory networks from multiple data sources via a dynamicBayesian network with structural emrdquo in DILS S C Boulakiaand V Tannen Eds vol 4544 of Lecture Notes in ComputerScience pp 204ndash214 Springer New York NY USA 2007

[22] K Murphy and S Mian Modeling gene expression data usingdynamic Bayesian networks University of California BerkeleyCalif USA 2001

[23] H Liu D Yue L Zhang Y Chen S J Gao and Y Huang ldquoABayesian approach for identifying miRNA targets by combin-ing sequence prediction and gene expression profilingrdquo BMCGenomics vol 11 no 3 p S12 2010

[24] Y Huang J Wang J Zhang M Sanchez and Y WangldquoBayesian inference of genetic regulatory networks from timeseries microarray data using dynamic Bayesian networksrdquoJournal of Multimedia vol 2 no 3 pp 46ndash56 2007

[25] B-E Perrin L Ralaivola A Mazurie S Bottani J Mallet andF DrsquoAlche-Buc ldquoGene networks inference using dynamicBayesian networksrdquoBioinformatics vol 19 no 2 pp ii138ndashii1482003

[26] C Rangel D LWild F Falciani Z Ghahramani and A GaibaldquoAmodelling biological responses using gene expression profil-ing and linear dynamical systemsrdquo Bioinformatics pp 349ndash3562005

[27] MQuach N Brunel and F drsquoAlch Buc ldquoEstimating parametersand hidden variables in non-linear state-space models based onODEs for biological networks inferencerdquoBioinformatics vol 23no 23 pp 3209ndash3216 2007

[28] F-X Wu W-J Zhang and A J Kusalik ldquoModeling geneexpression from microarray expression data with state-space

Advances in Bioinformatics 11

equationsrdquo in Pacific Symposium on Biocomputing R B Alt-man A K Dunker L Hunter T A Jung and T E Klein Edspp 581ndash592 World Scientific River Edge NJ USA 2004

[29] R Yamaguchi S Yoshida S Imoto T Higuchi and S MiyanoldquoFinding module-based gene networks with state-spacemodelsmining high-dimensional and short time-course geneexpression datardquo IEEE Signal Processing Magazine vol 24 no1 pp 37ndash46 2007

[30] O Hirose R Yoshida S Imoto et al ldquoStatistical inferenceof transcriptional module-based gene networks from timecourse gene expression profiles by using state space modelsrdquoBioinformatics vol 24 no 7 pp 932ndash942 2008

[31] J Angus M Beal J Li C Rangel and D Wild ldquoInferringtranscriptional networks using prior biological knowledge andconstrained state-space modelsrdquo in Learning and Inference inComputational Systems Biology N Lawrence M GirolamiM Rattray and G Sanguinetti Eds pp 117ndash152 MIT PressCambridge UK 2010

[32] C Rangel J Angus Z Ghahramani et al ldquoModeling T-cell acti-vation using gene expression profiling and state-space modelsrdquoBioinformatics vol 20 no 9 pp 1361ndash1372 2004

[33] A Noor E SerpedinMN Nounou andHNNounou ldquoInfer-ring gene regulatory networks via nonlinear state-space modelsand exploiting sparsityrdquo IEEEACM Transactions on Computa-tional Biology and Bioinformatics vol 9 no 4 pp 1203ndash12112012

[34] Z Wang X Liu Y Liu J Liang and V Vinciotti ldquoAn extendedkalman filtering approach tomodeling nonlinear dynamic generegulatory networks via short gene expression time seriesrdquoIEEEACM Transactions on Computational Biology and Bioin-formatics vol 6 no 3 pp 410ndash419 2009

[35] A Noor E Serpedin M Nounou H Nounou N Mohamedand L Chouchane ldquoAn overview of the statistical methodsused for inferring gene regulatory networks and proteinproteininteraction networksrdquo Advances in Bioinformatics vol 2013Article ID 953814 12 pages 2013

[36] I Arasaratnam and S Haykin ldquoCubature kalman filtersrdquo IEEETransactions on Automatic Control vol 54 no 6 pp 1254ndash12692009

[37] A Noor E Serpedin M N Nounou and H N Nounou ldquoAcubature Kalman filter approach for inferring gene regulatorynetworks using time series datardquo in Proceedings of the IEEEInternationalWorkshop onGenomic Signal Processing and Statis-tics (GENSIPS rsquo11) pp 25ndash28 2011

[38] A Carmi P Gurfil and D Kanevsky ldquoMethods for sparsesignal recovery using kalman filtering with embedded rseudo-measurement norms and quasi-normsrdquo IEEE Transactions onSignal Processing vol 58 no 4 pp 2405ndash2409 2010

[39] C A Penfold andD LWild ldquoHow to infer gene networks fromexpression profiles revisitedrdquo Interface Focus pp 857ndash870 2011

[40] I Cantone L Marucci F Iorio et al ldquoA yeast synthetic networkfor in vivo assessment of reverse-engineering and modelingapproachesrdquo Cell vol 137 no 1 pp 172ndash181 2009

[41] Y Huang I M Tienda-Luna and Y Wang ldquoReverse engineer-ing gene regulatory networks a survey of statistical modelsrdquoIEEE Signal Processing Magazine vol 26 no 1 pp 76ndash97 2009

[42] Z Wang F Yang D W C Ho S Swift A Tucker and X LiuldquoStochastic dynamic modeling of short gene expression time-series datardquo IEEE Transactions on Nanobioscience vol 7 no 1pp 44ndash55 2008

[43] H Xiong and Y Choe ldquoStructural systems identification ofgenetic regulatory networksrdquo Bioinformatics vol 24 no 4 pp553ndash560 2008

[44] R Tibshirani ldquoRegression shrinkage and selection via the lassordquoJournal of the Royal Statistical Society B vol 58 no 1 pp 267ndash288 1996

[45] E J Cands andT Tao ldquoDecoding by linear programmingrdquo IEEETransactions on Information Theory vol 51 no 12 pp 4203ndash4215 2005

[46] J D Geeter H V Brussel and J D Schutter ldquoA smoothly con-strained Kalman filterrdquo IEEE Transactions on Pattern Analysisand Machine Intelligence vol 19 no 10 pp 1171ndash1177 1997

[47] S M Kay Fundamentals of Statistical Signal Processing Estima-tion Theory Prentice-Hall New York NY USA 1993

[48] httpwikic2b2columbiaedudream

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 7: Research Article Reverse Engineering Sparse Gene ...downloads.hindawi.com/journals/abi/2013/205763.pdf · compressed sensing-based Kalman lter is used for the estimation of system

Advances in Bioinformatics 7

5 10 15 20 25 30 35 4002

022

024

026

028

03

032

034

036

038

Sample size

False

alar

m er

rors

EKFCKFS

(a)

035

04

045

05

055

06

065

07

Ham

min

g di

stanc

e

5 10 15 20 25 30 35 40Sample size

EKFCKFS

(b)

5 10 15 20 25 30 35 40045

05

055

06

065

07

075

08

Sample size

True

conn

ectio

ns

EKFCKFS

(c)

Figure 3 (a) (b) and (c) False alarm errors Hamming distance and true connections The synthetic networks consist of 8 vertices and 20edges The metric is normalized over the number of edges CKFS gives lower error and predicts more true connections with the increase inthe sample size of data

119864 represents the number of edges and 119870 denotes the timepoints It is observed that the CKFS exhibits superior perfor-mance than the EKF for networks of different sizes

The complexity of the two algorithms is compared forsynthetically generated networks with number of genes equalto 10 20 30 and 40The sample size is kept to 50 time pointsfor each of these networks and the run time for EKF andCKFS algorithms is calculated as shown in Table 1 It is notedthat EKF is faster for smaller network sizes but as the networksize increases the run time gets much larger than that forCKFS The main reason for this is that EKF [34] estimatesthe states and parameters by stacking them together whichrequires large-sized matrix multiplications at each iteration

The benefit associated with performing dual estimation asin CKFS is that the parameters are estimated separatelyfrom the states Since the system is linear and one-to-onefor parameters inversion of much smaller matrices can beperformed reducing the computational complexity of CKFSalgorithm CKFS is therefore particularly attractive for large-sized networks

52 DREAM4Gene Networks Several in silico networks havebeen produced in order to benchmark the performanceof gene network inference algorithms dialogue on reverseengineering assessment and methods (DREAM) in siliconetworks serve as one of the popular methods used for this

8 Advances in Bioinformatics

0 02 04 06 08 10

01

02

03

04

05

06

07

08

09

1

False positive rate

True

pos

itive

rate

ROC curve

CKFSEKF

(a)

0 02 04 06 08 10

01

02

03

04

05

06

07

08

09

1

False positive rate

True

pos

itive

rate

ROC curve

CKFSEKF

(b)

0 02 04 06 08 10

01

02

03

04

05

06

07

08

09

1

False positive rate

True

pos

itive

rate

ROC curve

CKFSEKF

(c)

Figure 4 ROC curves for the performance of CKFS and EKF using synthetic data (119873119864119870) (a) (b) and (c) (5 10 20) (10 12 20) and(15 19 20) The area under the ROC curve for CKFS is more than that for EKF for various sized networks

Table 1 Run time in seconds for EKF and CKFS algorithms forvarying network sizes for synthetically generated data The numberof sample points is fixed to 50

Number of genes 10 20 30 40EKF 016 19 165 84CKFS 12 43 115 241

purpose [39 48] In this section the performance of theCKFS algorithm is evaluated using the 10-gene and 100-gene networks released online by the DREAM4 challenge

Five networks are produced using the known GRNs ofEscherichia coli and Saccharomyces cerevisiae The data setsfor each of 10-gene network consists of 21 data points forfive different perturbations The inference is performed byusing all the perturbations The 100-gene network consistsof data sets for ten perturbations AUROC and area underthe precision-recall curve (AUPR) are calculated for the fivenetworks of both the data sets and shown in Tables 2 and 3respectively The quantities precision and recall are definedas 119875 = 119879(119879 + 119865) and 119877 = 119879(119879 + 119872) respectively Forcomparison purposes the values of the two quantities fortime-series network identification (TSNI) algorithm that

Advances in Bioinformatics 9

GAL80 GAL4

SW15 CBF1

ASH1

(a)

GAL80 GAL4

SW15 CBF1

ASH1

(b)

GAL80 GAL4

SW15 CBF1

ASH1

(c)

Figure 5 The inferred IRMA networks (a) (b) and (c) Gold standard inferred network using CKFS and inferred network using ODE[39 40] Black arrows indicate true connections blue arrows indicate the edges that are correct but their directions are reversed and redarrows indicate false positives

Table 2 Area under the ROC curve (AUROC) and area under the PR curve (AUPR) for DREAM4 10-gene networks for the five differentnetworks

Algorithm Network 1 Network 2 Network 3 Network 4 Network 5ODE [39] 062 (027) 063 (032) 058 (021) 063 (023) 068 (025)CKFS 063 (040) 067 (050) 072 (050) 075 (049) 081 (042)Random [39] 055 (018) 055 (019) 055 (017) 057 (017) 056 (016)

Table 3 Area under the ROC curve (AUROC) and area under the PR curve (AUPR) for DREAM4 100-gene networks for the five differentnetworks

Algorithm Network 1 Network 2 Network 3 Network 4 Network 5ODE [39] 055 (002) 055 (003) 060 (003) 054 (002) 059 (003)CKFS 067 (013) 057 (008) 060 (010) 062 (010) 060 (007)Random [39] 050 (0002) 050 (0002) 050 (0002) 050 (0002) 050 (0002)

exploits ordinary differential equations are also given [39]The CKFS algorithm is found to perform significantly betterthan the TSNI algorithm

53 IRMA Gene Network In addition to synthetic data it isimperative to test the algorithms using real biological dataIn this subsection the performance of the CKFS algorithm isassessed using the in vivo reverse-engineering and modelingassessment (IRMA) network [40] This network consists offive genes Galactose activates the gene expression in thenetwork whereas glucose deactivates it The cells are grownin the presence of galactose and then switched to glucoseto obtain the switch-off data which represents the expressivesamples at 21 time points The switch-on data consists of 16sample points and is obtained by growing the cells in a glucosemedium and then changing to galactose The system andmeasurement noise variances for the CKFS are assumed tobe identical as in the previous simulations Figure 5 shows theinferred network the gold standard and the network inferredusing TSNI It is observed that the CKFS algorithm succeeds

to predict most of the interactions while giving lower falsepositives

6 Conclusions

This paper presents a novel algorithm for inferring generegulatory networks from time-series data Gene regulationis assumed to follow a nonlinear state evolution model Theparameters of the system which indicate the inhibitory orexcitatory relationships between the genes are estimatedusing compressed sensing-based Kalman filtering The spar-sity constraint on the parameters is crucial because the genesare known to interact with few other genes only The use ofCKF and the dual estimation of states and parameters rendersthe algorithm computationally efficient The performance ofCKFS is evaluated for synthetic data for different networksizes as well as varying sample points ROC curves Hammingdistance and true positives are used for comparing theaccuracy of inferred network with EKF It is observed thatCKFS outperforms the EKF algorithm In addition CKFS

10 Advances in Bioinformatics

gives advantages over EKF in terms of smaller run time forlarge networks The Cramer-Rao lower bound is also deter-mined for the parameters of the model and compared withthe MSE performance of the proposed algorithm Assess-ment using DREAM4 10-gene and 100-gene networks andIRMA network data corroborates the superior performanceof CKFS Future research directions include incorporatingthe estimation of model order in the network inferencealgorithm

Acknowledgments

This work was supported by USNational Science Foundation(NSF) Grant 0915444 and QNRF-NPRP Grant 09-874-3-235 The material in this paper was presented in part at theIEEE InternationalWorkshop on Genomic Signal Processingand Statistics (GENSIPS) San Antonio TX USA December2011

References

[1] X Zhou X Wang and E R Dougherty Genomic NetworksStatistical Inference from Microarray Data John Wiley amp SonsNew York NY USA 2006

[2] H Kitano ldquoComputational systems biologyrdquo Nature vol 420pp 206ndash210 2002

[3] X Zhou and S T CWongComputational Systems Bioinformat-ics World Scientific River Edge NJ USA 2008

[4] X Cai and X Wang ldquoStochastic modeling and simulation ofgene networksrdquo IEEE Signal Processing Magazine vol 24 no 1pp 27ndash36 2007

[5] D Yue J Meng M Lu C L P Chen M Guo and Y HuangldquoUnderstanding micro-RNA regulation a computational per-spectiverdquo IEEE Signal ProcessingMagazine vol 29 no 1 pp 77ndash88 2012

[6] R Pal S Bhattacharya and M U Caglar ldquoRobust approachesfor genetic regulatory network modeling and intervention areview of recent advancesrdquo IEEE Signal Processing Magazinevol 29 no 1 pp 66ndash76 2012

[7] H Hache H Lehrach and R Herwig ldquoReverse engineering ofgene regulatory networks a comparative studyrdquo Eurasip Journalon Bioinformatics and Systems Biology vol 2009 Article ID617281 2009

[8] T Schlitt and A Brazma ldquoCurrent approaches to gene regula-tory networkmodellingrdquo BMC Bioinformatics vol 8 no 6 p 92007

[9] H D Jong ldquoModeling and simulation of genetic regulatoy sys-tems a literature reviewrdquo Journal of Computational Biology vol9 no 1 pp 67ndash103 2002

[10] I Nachman A Regev and N Friedman ldquoInferring quantitativemodels of regulatory networks from expression datardquo Bioinfor-matics vol 20 no 1 pp i248ndashi256 2004

[11] C D Giurcaneanu I Tabus and J Astola ldquoClustering timeseries gene expression data based on sum-of-exponentials fit-tingrdquo EURASIP Journal on Advances in Signal Processing vol2005 no 8 Article ID 358568 pp 1159ndash1173 2005

[12] C D Giurcaneanu I Tabus J Astola J Ollila and M VihinenldquoFast iterative gene clustering based on information theoreticcriteria for selecting the cluster structurerdquo Journal of Computa-tional Biology vol 11 no 4 pp 660ndash682 2004

[13] X Cai andG B Giannakis ldquoIdentifying differentially expressedgenes in microarray experiments with model-based varianceestimationrdquo IEEE Transactions on Signal Processing vol 54 no6 pp 2418ndash2426 2006

[14] X Zhou XWang and E R Dougherty ldquoGene clustering basedon cluster-wide mutual informationrdquo Journal of ComputationalBiology vol 11 no 1 pp 151ndash165 2004

[15] W Zhao E Serpedin and E R Dougherty ldquoInferring connec-tivity of genetic regulatory networks using informationtheoreticcriteriardquo IEEEACMTransactions onComputational Biology andBioinformatics vol 5 no 2 pp 262ndash274 2008

[16] J Dougherty I Tabus and J Astola ldquoInference of gene reg-ulatory networks based on a universal minimum descriptionlengthrdquo Eurasip Journal on Bioinformatics and Systems Biologyvol 2008 Article ID 482090 2008

[17] L Qian H Wang and E R Dougherty ldquoInference of noisynonlinear differential equation models for gene regulatory net-works using genetic programming and Kalman filteringrdquo IEEETransactions on Signal Processing vol 56 no 7 pp 3327ndash33392008

[18] W Zhao E Serpedin and E R Dougherty ldquoInferring gene reg-ulatory networks from time series data using the minimumdescription length principlerdquo Bioinformatics vol 22 no 17 pp2129ndash2135 2006

[19] X Zhou X Wang R Pal I Ivanov M Bittner and E RDougherty ldquoA Bayesian connectivity-based approach to con-structing probabilistic gene regulatory networksrdquo Bioinformat-ics vol 20 no 17 pp 2918ndash2927 2004

[20] J Meng M Lu Y Chen S-J Gao and Y Huang ldquoRobust infer-ence of the context specific structure and temporal dynamics ofgene regulatory networkrdquo BMC Genomics vol 11 no 3 p S112010

[21] Y Zhang Z Deng H Jiang and P Jia ldquoInferring gene reg-ulatory networks from multiple data sources via a dynamicBayesian network with structural emrdquo in DILS S C Boulakiaand V Tannen Eds vol 4544 of Lecture Notes in ComputerScience pp 204ndash214 Springer New York NY USA 2007

[22] K Murphy and S Mian Modeling gene expression data usingdynamic Bayesian networks University of California BerkeleyCalif USA 2001

[23] H Liu D Yue L Zhang Y Chen S J Gao and Y Huang ldquoABayesian approach for identifying miRNA targets by combin-ing sequence prediction and gene expression profilingrdquo BMCGenomics vol 11 no 3 p S12 2010

[24] Y Huang J Wang J Zhang M Sanchez and Y WangldquoBayesian inference of genetic regulatory networks from timeseries microarray data using dynamic Bayesian networksrdquoJournal of Multimedia vol 2 no 3 pp 46ndash56 2007

[25] B-E Perrin L Ralaivola A Mazurie S Bottani J Mallet andF DrsquoAlche-Buc ldquoGene networks inference using dynamicBayesian networksrdquoBioinformatics vol 19 no 2 pp ii138ndashii1482003

[26] C Rangel D LWild F Falciani Z Ghahramani and A GaibaldquoAmodelling biological responses using gene expression profil-ing and linear dynamical systemsrdquo Bioinformatics pp 349ndash3562005

[27] MQuach N Brunel and F drsquoAlch Buc ldquoEstimating parametersand hidden variables in non-linear state-space models based onODEs for biological networks inferencerdquoBioinformatics vol 23no 23 pp 3209ndash3216 2007

[28] F-X Wu W-J Zhang and A J Kusalik ldquoModeling geneexpression from microarray expression data with state-space

Advances in Bioinformatics 11

equationsrdquo in Pacific Symposium on Biocomputing R B Alt-man A K Dunker L Hunter T A Jung and T E Klein Edspp 581ndash592 World Scientific River Edge NJ USA 2004

[29] R Yamaguchi S Yoshida S Imoto T Higuchi and S MiyanoldquoFinding module-based gene networks with state-spacemodelsmining high-dimensional and short time-course geneexpression datardquo IEEE Signal Processing Magazine vol 24 no1 pp 37ndash46 2007

[30] O Hirose R Yoshida S Imoto et al ldquoStatistical inferenceof transcriptional module-based gene networks from timecourse gene expression profiles by using state space modelsrdquoBioinformatics vol 24 no 7 pp 932ndash942 2008

[31] J Angus M Beal J Li C Rangel and D Wild ldquoInferringtranscriptional networks using prior biological knowledge andconstrained state-space modelsrdquo in Learning and Inference inComputational Systems Biology N Lawrence M GirolamiM Rattray and G Sanguinetti Eds pp 117ndash152 MIT PressCambridge UK 2010

[32] C Rangel J Angus Z Ghahramani et al ldquoModeling T-cell acti-vation using gene expression profiling and state-space modelsrdquoBioinformatics vol 20 no 9 pp 1361ndash1372 2004

[33] A Noor E SerpedinMN Nounou andHNNounou ldquoInfer-ring gene regulatory networks via nonlinear state-space modelsand exploiting sparsityrdquo IEEEACM Transactions on Computa-tional Biology and Bioinformatics vol 9 no 4 pp 1203ndash12112012

[34] Z Wang X Liu Y Liu J Liang and V Vinciotti ldquoAn extendedkalman filtering approach tomodeling nonlinear dynamic generegulatory networks via short gene expression time seriesrdquoIEEEACM Transactions on Computational Biology and Bioin-formatics vol 6 no 3 pp 410ndash419 2009

[35] A Noor E Serpedin M Nounou H Nounou N Mohamedand L Chouchane ldquoAn overview of the statistical methodsused for inferring gene regulatory networks and proteinproteininteraction networksrdquo Advances in Bioinformatics vol 2013Article ID 953814 12 pages 2013

[36] I Arasaratnam and S Haykin ldquoCubature kalman filtersrdquo IEEETransactions on Automatic Control vol 54 no 6 pp 1254ndash12692009

[37] A Noor E Serpedin M N Nounou and H N Nounou ldquoAcubature Kalman filter approach for inferring gene regulatorynetworks using time series datardquo in Proceedings of the IEEEInternationalWorkshop onGenomic Signal Processing and Statis-tics (GENSIPS rsquo11) pp 25ndash28 2011

[38] A Carmi P Gurfil and D Kanevsky ldquoMethods for sparsesignal recovery using kalman filtering with embedded rseudo-measurement norms and quasi-normsrdquo IEEE Transactions onSignal Processing vol 58 no 4 pp 2405ndash2409 2010

[39] C A Penfold andD LWild ldquoHow to infer gene networks fromexpression profiles revisitedrdquo Interface Focus pp 857ndash870 2011

[40] I Cantone L Marucci F Iorio et al ldquoA yeast synthetic networkfor in vivo assessment of reverse-engineering and modelingapproachesrdquo Cell vol 137 no 1 pp 172ndash181 2009

[41] Y Huang I M Tienda-Luna and Y Wang ldquoReverse engineer-ing gene regulatory networks a survey of statistical modelsrdquoIEEE Signal Processing Magazine vol 26 no 1 pp 76ndash97 2009

[42] Z Wang F Yang D W C Ho S Swift A Tucker and X LiuldquoStochastic dynamic modeling of short gene expression time-series datardquo IEEE Transactions on Nanobioscience vol 7 no 1pp 44ndash55 2008

[43] H Xiong and Y Choe ldquoStructural systems identification ofgenetic regulatory networksrdquo Bioinformatics vol 24 no 4 pp553ndash560 2008

[44] R Tibshirani ldquoRegression shrinkage and selection via the lassordquoJournal of the Royal Statistical Society B vol 58 no 1 pp 267ndash288 1996

[45] E J Cands andT Tao ldquoDecoding by linear programmingrdquo IEEETransactions on Information Theory vol 51 no 12 pp 4203ndash4215 2005

[46] J D Geeter H V Brussel and J D Schutter ldquoA smoothly con-strained Kalman filterrdquo IEEE Transactions on Pattern Analysisand Machine Intelligence vol 19 no 10 pp 1171ndash1177 1997

[47] S M Kay Fundamentals of Statistical Signal Processing Estima-tion Theory Prentice-Hall New York NY USA 1993

[48] httpwikic2b2columbiaedudream

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 8: Research Article Reverse Engineering Sparse Gene ...downloads.hindawi.com/journals/abi/2013/205763.pdf · compressed sensing-based Kalman lter is used for the estimation of system

8 Advances in Bioinformatics

0 02 04 06 08 10

01

02

03

04

05

06

07

08

09

1

False positive rate

True

pos

itive

rate

ROC curve

CKFSEKF

(a)

0 02 04 06 08 10

01

02

03

04

05

06

07

08

09

1

False positive rate

True

pos

itive

rate

ROC curve

CKFSEKF

(b)

0 02 04 06 08 10

01

02

03

04

05

06

07

08

09

1

False positive rate

True

pos

itive

rate

ROC curve

CKFSEKF

(c)

Figure 4 ROC curves for the performance of CKFS and EKF using synthetic data (119873119864119870) (a) (b) and (c) (5 10 20) (10 12 20) and(15 19 20) The area under the ROC curve for CKFS is more than that for EKF for various sized networks

Table 1 Run time in seconds for EKF and CKFS algorithms forvarying network sizes for synthetically generated data The numberof sample points is fixed to 50

Number of genes 10 20 30 40EKF 016 19 165 84CKFS 12 43 115 241

purpose [39 48] In this section the performance of theCKFS algorithm is evaluated using the 10-gene and 100-gene networks released online by the DREAM4 challenge

Five networks are produced using the known GRNs ofEscherichia coli and Saccharomyces cerevisiae The data setsfor each of 10-gene network consists of 21 data points forfive different perturbations The inference is performed byusing all the perturbations The 100-gene network consistsof data sets for ten perturbations AUROC and area underthe precision-recall curve (AUPR) are calculated for the fivenetworks of both the data sets and shown in Tables 2 and 3respectively The quantities precision and recall are definedas 119875 = 119879(119879 + 119865) and 119877 = 119879(119879 + 119872) respectively Forcomparison purposes the values of the two quantities fortime-series network identification (TSNI) algorithm that

Advances in Bioinformatics 9

GAL80 GAL4

SW15 CBF1

ASH1

(a)

GAL80 GAL4

SW15 CBF1

ASH1

(b)

GAL80 GAL4

SW15 CBF1

ASH1

(c)

Figure 5 The inferred IRMA networks (a) (b) and (c) Gold standard inferred network using CKFS and inferred network using ODE[39 40] Black arrows indicate true connections blue arrows indicate the edges that are correct but their directions are reversed and redarrows indicate false positives

Table 2 Area under the ROC curve (AUROC) and area under the PR curve (AUPR) for DREAM4 10-gene networks for the five differentnetworks

Algorithm Network 1 Network 2 Network 3 Network 4 Network 5ODE [39] 062 (027) 063 (032) 058 (021) 063 (023) 068 (025)CKFS 063 (040) 067 (050) 072 (050) 075 (049) 081 (042)Random [39] 055 (018) 055 (019) 055 (017) 057 (017) 056 (016)

Table 3 Area under the ROC curve (AUROC) and area under the PR curve (AUPR) for DREAM4 100-gene networks for the five differentnetworks

Algorithm Network 1 Network 2 Network 3 Network 4 Network 5ODE [39] 055 (002) 055 (003) 060 (003) 054 (002) 059 (003)CKFS 067 (013) 057 (008) 060 (010) 062 (010) 060 (007)Random [39] 050 (0002) 050 (0002) 050 (0002) 050 (0002) 050 (0002)

exploits ordinary differential equations are also given [39]The CKFS algorithm is found to perform significantly betterthan the TSNI algorithm

53 IRMA Gene Network In addition to synthetic data it isimperative to test the algorithms using real biological dataIn this subsection the performance of the CKFS algorithm isassessed using the in vivo reverse-engineering and modelingassessment (IRMA) network [40] This network consists offive genes Galactose activates the gene expression in thenetwork whereas glucose deactivates it The cells are grownin the presence of galactose and then switched to glucoseto obtain the switch-off data which represents the expressivesamples at 21 time points The switch-on data consists of 16sample points and is obtained by growing the cells in a glucosemedium and then changing to galactose The system andmeasurement noise variances for the CKFS are assumed tobe identical as in the previous simulations Figure 5 shows theinferred network the gold standard and the network inferredusing TSNI It is observed that the CKFS algorithm succeeds

to predict most of the interactions while giving lower falsepositives

6 Conclusions

This paper presents a novel algorithm for inferring generegulatory networks from time-series data Gene regulationis assumed to follow a nonlinear state evolution model Theparameters of the system which indicate the inhibitory orexcitatory relationships between the genes are estimatedusing compressed sensing-based Kalman filtering The spar-sity constraint on the parameters is crucial because the genesare known to interact with few other genes only The use ofCKF and the dual estimation of states and parameters rendersthe algorithm computationally efficient The performance ofCKFS is evaluated for synthetic data for different networksizes as well as varying sample points ROC curves Hammingdistance and true positives are used for comparing theaccuracy of inferred network with EKF It is observed thatCKFS outperforms the EKF algorithm In addition CKFS

10 Advances in Bioinformatics

gives advantages over EKF in terms of smaller run time forlarge networks The Cramer-Rao lower bound is also deter-mined for the parameters of the model and compared withthe MSE performance of the proposed algorithm Assess-ment using DREAM4 10-gene and 100-gene networks andIRMA network data corroborates the superior performanceof CKFS Future research directions include incorporatingthe estimation of model order in the network inferencealgorithm

Acknowledgments

This work was supported by USNational Science Foundation(NSF) Grant 0915444 and QNRF-NPRP Grant 09-874-3-235 The material in this paper was presented in part at theIEEE InternationalWorkshop on Genomic Signal Processingand Statistics (GENSIPS) San Antonio TX USA December2011

References

[1] X Zhou X Wang and E R Dougherty Genomic NetworksStatistical Inference from Microarray Data John Wiley amp SonsNew York NY USA 2006

[2] H Kitano ldquoComputational systems biologyrdquo Nature vol 420pp 206ndash210 2002

[3] X Zhou and S T CWongComputational Systems Bioinformat-ics World Scientific River Edge NJ USA 2008

[4] X Cai and X Wang ldquoStochastic modeling and simulation ofgene networksrdquo IEEE Signal Processing Magazine vol 24 no 1pp 27ndash36 2007

[5] D Yue J Meng M Lu C L P Chen M Guo and Y HuangldquoUnderstanding micro-RNA regulation a computational per-spectiverdquo IEEE Signal ProcessingMagazine vol 29 no 1 pp 77ndash88 2012

[6] R Pal S Bhattacharya and M U Caglar ldquoRobust approachesfor genetic regulatory network modeling and intervention areview of recent advancesrdquo IEEE Signal Processing Magazinevol 29 no 1 pp 66ndash76 2012

[7] H Hache H Lehrach and R Herwig ldquoReverse engineering ofgene regulatory networks a comparative studyrdquo Eurasip Journalon Bioinformatics and Systems Biology vol 2009 Article ID617281 2009

[8] T Schlitt and A Brazma ldquoCurrent approaches to gene regula-tory networkmodellingrdquo BMC Bioinformatics vol 8 no 6 p 92007

[9] H D Jong ldquoModeling and simulation of genetic regulatoy sys-tems a literature reviewrdquo Journal of Computational Biology vol9 no 1 pp 67ndash103 2002

[10] I Nachman A Regev and N Friedman ldquoInferring quantitativemodels of regulatory networks from expression datardquo Bioinfor-matics vol 20 no 1 pp i248ndashi256 2004

[11] C D Giurcaneanu I Tabus and J Astola ldquoClustering timeseries gene expression data based on sum-of-exponentials fit-tingrdquo EURASIP Journal on Advances in Signal Processing vol2005 no 8 Article ID 358568 pp 1159ndash1173 2005

[12] C D Giurcaneanu I Tabus J Astola J Ollila and M VihinenldquoFast iterative gene clustering based on information theoreticcriteria for selecting the cluster structurerdquo Journal of Computa-tional Biology vol 11 no 4 pp 660ndash682 2004

[13] X Cai andG B Giannakis ldquoIdentifying differentially expressedgenes in microarray experiments with model-based varianceestimationrdquo IEEE Transactions on Signal Processing vol 54 no6 pp 2418ndash2426 2006

[14] X Zhou XWang and E R Dougherty ldquoGene clustering basedon cluster-wide mutual informationrdquo Journal of ComputationalBiology vol 11 no 1 pp 151ndash165 2004

[15] W Zhao E Serpedin and E R Dougherty ldquoInferring connec-tivity of genetic regulatory networks using informationtheoreticcriteriardquo IEEEACMTransactions onComputational Biology andBioinformatics vol 5 no 2 pp 262ndash274 2008

[16] J Dougherty I Tabus and J Astola ldquoInference of gene reg-ulatory networks based on a universal minimum descriptionlengthrdquo Eurasip Journal on Bioinformatics and Systems Biologyvol 2008 Article ID 482090 2008

[17] L Qian H Wang and E R Dougherty ldquoInference of noisynonlinear differential equation models for gene regulatory net-works using genetic programming and Kalman filteringrdquo IEEETransactions on Signal Processing vol 56 no 7 pp 3327ndash33392008

[18] W Zhao E Serpedin and E R Dougherty ldquoInferring gene reg-ulatory networks from time series data using the minimumdescription length principlerdquo Bioinformatics vol 22 no 17 pp2129ndash2135 2006

[19] X Zhou X Wang R Pal I Ivanov M Bittner and E RDougherty ldquoA Bayesian connectivity-based approach to con-structing probabilistic gene regulatory networksrdquo Bioinformat-ics vol 20 no 17 pp 2918ndash2927 2004

[20] J Meng M Lu Y Chen S-J Gao and Y Huang ldquoRobust infer-ence of the context specific structure and temporal dynamics ofgene regulatory networkrdquo BMC Genomics vol 11 no 3 p S112010

[21] Y Zhang Z Deng H Jiang and P Jia ldquoInferring gene reg-ulatory networks from multiple data sources via a dynamicBayesian network with structural emrdquo in DILS S C Boulakiaand V Tannen Eds vol 4544 of Lecture Notes in ComputerScience pp 204ndash214 Springer New York NY USA 2007

[22] K Murphy and S Mian Modeling gene expression data usingdynamic Bayesian networks University of California BerkeleyCalif USA 2001

[23] H Liu D Yue L Zhang Y Chen S J Gao and Y Huang ldquoABayesian approach for identifying miRNA targets by combin-ing sequence prediction and gene expression profilingrdquo BMCGenomics vol 11 no 3 p S12 2010

[24] Y Huang J Wang J Zhang M Sanchez and Y WangldquoBayesian inference of genetic regulatory networks from timeseries microarray data using dynamic Bayesian networksrdquoJournal of Multimedia vol 2 no 3 pp 46ndash56 2007

[25] B-E Perrin L Ralaivola A Mazurie S Bottani J Mallet andF DrsquoAlche-Buc ldquoGene networks inference using dynamicBayesian networksrdquoBioinformatics vol 19 no 2 pp ii138ndashii1482003

[26] C Rangel D LWild F Falciani Z Ghahramani and A GaibaldquoAmodelling biological responses using gene expression profil-ing and linear dynamical systemsrdquo Bioinformatics pp 349ndash3562005

[27] MQuach N Brunel and F drsquoAlch Buc ldquoEstimating parametersand hidden variables in non-linear state-space models based onODEs for biological networks inferencerdquoBioinformatics vol 23no 23 pp 3209ndash3216 2007

[28] F-X Wu W-J Zhang and A J Kusalik ldquoModeling geneexpression from microarray expression data with state-space

Advances in Bioinformatics 11

equationsrdquo in Pacific Symposium on Biocomputing R B Alt-man A K Dunker L Hunter T A Jung and T E Klein Edspp 581ndash592 World Scientific River Edge NJ USA 2004

[29] R Yamaguchi S Yoshida S Imoto T Higuchi and S MiyanoldquoFinding module-based gene networks with state-spacemodelsmining high-dimensional and short time-course geneexpression datardquo IEEE Signal Processing Magazine vol 24 no1 pp 37ndash46 2007

[30] O Hirose R Yoshida S Imoto et al ldquoStatistical inferenceof transcriptional module-based gene networks from timecourse gene expression profiles by using state space modelsrdquoBioinformatics vol 24 no 7 pp 932ndash942 2008

[31] J Angus M Beal J Li C Rangel and D Wild ldquoInferringtranscriptional networks using prior biological knowledge andconstrained state-space modelsrdquo in Learning and Inference inComputational Systems Biology N Lawrence M GirolamiM Rattray and G Sanguinetti Eds pp 117ndash152 MIT PressCambridge UK 2010

[32] C Rangel J Angus Z Ghahramani et al ldquoModeling T-cell acti-vation using gene expression profiling and state-space modelsrdquoBioinformatics vol 20 no 9 pp 1361ndash1372 2004

[33] A Noor E SerpedinMN Nounou andHNNounou ldquoInfer-ring gene regulatory networks via nonlinear state-space modelsand exploiting sparsityrdquo IEEEACM Transactions on Computa-tional Biology and Bioinformatics vol 9 no 4 pp 1203ndash12112012

[34] Z Wang X Liu Y Liu J Liang and V Vinciotti ldquoAn extendedkalman filtering approach tomodeling nonlinear dynamic generegulatory networks via short gene expression time seriesrdquoIEEEACM Transactions on Computational Biology and Bioin-formatics vol 6 no 3 pp 410ndash419 2009

[35] A Noor E Serpedin M Nounou H Nounou N Mohamedand L Chouchane ldquoAn overview of the statistical methodsused for inferring gene regulatory networks and proteinproteininteraction networksrdquo Advances in Bioinformatics vol 2013Article ID 953814 12 pages 2013

[36] I Arasaratnam and S Haykin ldquoCubature kalman filtersrdquo IEEETransactions on Automatic Control vol 54 no 6 pp 1254ndash12692009

[37] A Noor E Serpedin M N Nounou and H N Nounou ldquoAcubature Kalman filter approach for inferring gene regulatorynetworks using time series datardquo in Proceedings of the IEEEInternationalWorkshop onGenomic Signal Processing and Statis-tics (GENSIPS rsquo11) pp 25ndash28 2011

[38] A Carmi P Gurfil and D Kanevsky ldquoMethods for sparsesignal recovery using kalman filtering with embedded rseudo-measurement norms and quasi-normsrdquo IEEE Transactions onSignal Processing vol 58 no 4 pp 2405ndash2409 2010

[39] C A Penfold andD LWild ldquoHow to infer gene networks fromexpression profiles revisitedrdquo Interface Focus pp 857ndash870 2011

[40] I Cantone L Marucci F Iorio et al ldquoA yeast synthetic networkfor in vivo assessment of reverse-engineering and modelingapproachesrdquo Cell vol 137 no 1 pp 172ndash181 2009

[41] Y Huang I M Tienda-Luna and Y Wang ldquoReverse engineer-ing gene regulatory networks a survey of statistical modelsrdquoIEEE Signal Processing Magazine vol 26 no 1 pp 76ndash97 2009

[42] Z Wang F Yang D W C Ho S Swift A Tucker and X LiuldquoStochastic dynamic modeling of short gene expression time-series datardquo IEEE Transactions on Nanobioscience vol 7 no 1pp 44ndash55 2008

[43] H Xiong and Y Choe ldquoStructural systems identification ofgenetic regulatory networksrdquo Bioinformatics vol 24 no 4 pp553ndash560 2008

[44] R Tibshirani ldquoRegression shrinkage and selection via the lassordquoJournal of the Royal Statistical Society B vol 58 no 1 pp 267ndash288 1996

[45] E J Cands andT Tao ldquoDecoding by linear programmingrdquo IEEETransactions on Information Theory vol 51 no 12 pp 4203ndash4215 2005

[46] J D Geeter H V Brussel and J D Schutter ldquoA smoothly con-strained Kalman filterrdquo IEEE Transactions on Pattern Analysisand Machine Intelligence vol 19 no 10 pp 1171ndash1177 1997

[47] S M Kay Fundamentals of Statistical Signal Processing Estima-tion Theory Prentice-Hall New York NY USA 1993

[48] httpwikic2b2columbiaedudream

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 9: Research Article Reverse Engineering Sparse Gene ...downloads.hindawi.com/journals/abi/2013/205763.pdf · compressed sensing-based Kalman lter is used for the estimation of system

Advances in Bioinformatics 9

GAL80 GAL4

SW15 CBF1

ASH1

(a)

GAL80 GAL4

SW15 CBF1

ASH1

(b)

GAL80 GAL4

SW15 CBF1

ASH1

(c)

Figure 5 The inferred IRMA networks (a) (b) and (c) Gold standard inferred network using CKFS and inferred network using ODE[39 40] Black arrows indicate true connections blue arrows indicate the edges that are correct but their directions are reversed and redarrows indicate false positives

Table 2 Area under the ROC curve (AUROC) and area under the PR curve (AUPR) for DREAM4 10-gene networks for the five differentnetworks

Algorithm Network 1 Network 2 Network 3 Network 4 Network 5ODE [39] 062 (027) 063 (032) 058 (021) 063 (023) 068 (025)CKFS 063 (040) 067 (050) 072 (050) 075 (049) 081 (042)Random [39] 055 (018) 055 (019) 055 (017) 057 (017) 056 (016)

Table 3 Area under the ROC curve (AUROC) and area under the PR curve (AUPR) for DREAM4 100-gene networks for the five differentnetworks

Algorithm Network 1 Network 2 Network 3 Network 4 Network 5ODE [39] 055 (002) 055 (003) 060 (003) 054 (002) 059 (003)CKFS 067 (013) 057 (008) 060 (010) 062 (010) 060 (007)Random [39] 050 (0002) 050 (0002) 050 (0002) 050 (0002) 050 (0002)

exploits ordinary differential equations are also given [39]The CKFS algorithm is found to perform significantly betterthan the TSNI algorithm

53 IRMA Gene Network In addition to synthetic data it isimperative to test the algorithms using real biological dataIn this subsection the performance of the CKFS algorithm isassessed using the in vivo reverse-engineering and modelingassessment (IRMA) network [40] This network consists offive genes Galactose activates the gene expression in thenetwork whereas glucose deactivates it The cells are grownin the presence of galactose and then switched to glucoseto obtain the switch-off data which represents the expressivesamples at 21 time points The switch-on data consists of 16sample points and is obtained by growing the cells in a glucosemedium and then changing to galactose The system andmeasurement noise variances for the CKFS are assumed tobe identical as in the previous simulations Figure 5 shows theinferred network the gold standard and the network inferredusing TSNI It is observed that the CKFS algorithm succeeds

to predict most of the interactions while giving lower falsepositives

6 Conclusions

This paper presents a novel algorithm for inferring generegulatory networks from time-series data Gene regulationis assumed to follow a nonlinear state evolution model Theparameters of the system which indicate the inhibitory orexcitatory relationships between the genes are estimatedusing compressed sensing-based Kalman filtering The spar-sity constraint on the parameters is crucial because the genesare known to interact with few other genes only The use ofCKF and the dual estimation of states and parameters rendersthe algorithm computationally efficient The performance ofCKFS is evaluated for synthetic data for different networksizes as well as varying sample points ROC curves Hammingdistance and true positives are used for comparing theaccuracy of inferred network with EKF It is observed thatCKFS outperforms the EKF algorithm In addition CKFS

10 Advances in Bioinformatics

gives advantages over EKF in terms of smaller run time forlarge networks The Cramer-Rao lower bound is also deter-mined for the parameters of the model and compared withthe MSE performance of the proposed algorithm Assess-ment using DREAM4 10-gene and 100-gene networks andIRMA network data corroborates the superior performanceof CKFS Future research directions include incorporatingthe estimation of model order in the network inferencealgorithm

Acknowledgments

This work was supported by USNational Science Foundation(NSF) Grant 0915444 and QNRF-NPRP Grant 09-874-3-235 The material in this paper was presented in part at theIEEE InternationalWorkshop on Genomic Signal Processingand Statistics (GENSIPS) San Antonio TX USA December2011

References

[1] X Zhou X Wang and E R Dougherty Genomic NetworksStatistical Inference from Microarray Data John Wiley amp SonsNew York NY USA 2006

[2] H Kitano ldquoComputational systems biologyrdquo Nature vol 420pp 206ndash210 2002

[3] X Zhou and S T CWongComputational Systems Bioinformat-ics World Scientific River Edge NJ USA 2008

[4] X Cai and X Wang ldquoStochastic modeling and simulation ofgene networksrdquo IEEE Signal Processing Magazine vol 24 no 1pp 27ndash36 2007

[5] D Yue J Meng M Lu C L P Chen M Guo and Y HuangldquoUnderstanding micro-RNA regulation a computational per-spectiverdquo IEEE Signal ProcessingMagazine vol 29 no 1 pp 77ndash88 2012

[6] R Pal S Bhattacharya and M U Caglar ldquoRobust approachesfor genetic regulatory network modeling and intervention areview of recent advancesrdquo IEEE Signal Processing Magazinevol 29 no 1 pp 66ndash76 2012

[7] H Hache H Lehrach and R Herwig ldquoReverse engineering ofgene regulatory networks a comparative studyrdquo Eurasip Journalon Bioinformatics and Systems Biology vol 2009 Article ID617281 2009

[8] T Schlitt and A Brazma ldquoCurrent approaches to gene regula-tory networkmodellingrdquo BMC Bioinformatics vol 8 no 6 p 92007

[9] H D Jong ldquoModeling and simulation of genetic regulatoy sys-tems a literature reviewrdquo Journal of Computational Biology vol9 no 1 pp 67ndash103 2002

[10] I Nachman A Regev and N Friedman ldquoInferring quantitativemodels of regulatory networks from expression datardquo Bioinfor-matics vol 20 no 1 pp i248ndashi256 2004

[11] C D Giurcaneanu I Tabus and J Astola ldquoClustering timeseries gene expression data based on sum-of-exponentials fit-tingrdquo EURASIP Journal on Advances in Signal Processing vol2005 no 8 Article ID 358568 pp 1159ndash1173 2005

[12] C D Giurcaneanu I Tabus J Astola J Ollila and M VihinenldquoFast iterative gene clustering based on information theoreticcriteria for selecting the cluster structurerdquo Journal of Computa-tional Biology vol 11 no 4 pp 660ndash682 2004

[13] X Cai andG B Giannakis ldquoIdentifying differentially expressedgenes in microarray experiments with model-based varianceestimationrdquo IEEE Transactions on Signal Processing vol 54 no6 pp 2418ndash2426 2006

[14] X Zhou XWang and E R Dougherty ldquoGene clustering basedon cluster-wide mutual informationrdquo Journal of ComputationalBiology vol 11 no 1 pp 151ndash165 2004

[15] W Zhao E Serpedin and E R Dougherty ldquoInferring connec-tivity of genetic regulatory networks using informationtheoreticcriteriardquo IEEEACMTransactions onComputational Biology andBioinformatics vol 5 no 2 pp 262ndash274 2008

[16] J Dougherty I Tabus and J Astola ldquoInference of gene reg-ulatory networks based on a universal minimum descriptionlengthrdquo Eurasip Journal on Bioinformatics and Systems Biologyvol 2008 Article ID 482090 2008

[17] L Qian H Wang and E R Dougherty ldquoInference of noisynonlinear differential equation models for gene regulatory net-works using genetic programming and Kalman filteringrdquo IEEETransactions on Signal Processing vol 56 no 7 pp 3327ndash33392008

[18] W Zhao E Serpedin and E R Dougherty ldquoInferring gene reg-ulatory networks from time series data using the minimumdescription length principlerdquo Bioinformatics vol 22 no 17 pp2129ndash2135 2006

[19] X Zhou X Wang R Pal I Ivanov M Bittner and E RDougherty ldquoA Bayesian connectivity-based approach to con-structing probabilistic gene regulatory networksrdquo Bioinformat-ics vol 20 no 17 pp 2918ndash2927 2004

[20] J Meng M Lu Y Chen S-J Gao and Y Huang ldquoRobust infer-ence of the context specific structure and temporal dynamics ofgene regulatory networkrdquo BMC Genomics vol 11 no 3 p S112010

[21] Y Zhang Z Deng H Jiang and P Jia ldquoInferring gene reg-ulatory networks from multiple data sources via a dynamicBayesian network with structural emrdquo in DILS S C Boulakiaand V Tannen Eds vol 4544 of Lecture Notes in ComputerScience pp 204ndash214 Springer New York NY USA 2007

[22] K Murphy and S Mian Modeling gene expression data usingdynamic Bayesian networks University of California BerkeleyCalif USA 2001

[23] H Liu D Yue L Zhang Y Chen S J Gao and Y Huang ldquoABayesian approach for identifying miRNA targets by combin-ing sequence prediction and gene expression profilingrdquo BMCGenomics vol 11 no 3 p S12 2010

[24] Y Huang J Wang J Zhang M Sanchez and Y WangldquoBayesian inference of genetic regulatory networks from timeseries microarray data using dynamic Bayesian networksrdquoJournal of Multimedia vol 2 no 3 pp 46ndash56 2007

[25] B-E Perrin L Ralaivola A Mazurie S Bottani J Mallet andF DrsquoAlche-Buc ldquoGene networks inference using dynamicBayesian networksrdquoBioinformatics vol 19 no 2 pp ii138ndashii1482003

[26] C Rangel D LWild F Falciani Z Ghahramani and A GaibaldquoAmodelling biological responses using gene expression profil-ing and linear dynamical systemsrdquo Bioinformatics pp 349ndash3562005

[27] MQuach N Brunel and F drsquoAlch Buc ldquoEstimating parametersand hidden variables in non-linear state-space models based onODEs for biological networks inferencerdquoBioinformatics vol 23no 23 pp 3209ndash3216 2007

[28] F-X Wu W-J Zhang and A J Kusalik ldquoModeling geneexpression from microarray expression data with state-space

Advances in Bioinformatics 11

equationsrdquo in Pacific Symposium on Biocomputing R B Alt-man A K Dunker L Hunter T A Jung and T E Klein Edspp 581ndash592 World Scientific River Edge NJ USA 2004

[29] R Yamaguchi S Yoshida S Imoto T Higuchi and S MiyanoldquoFinding module-based gene networks with state-spacemodelsmining high-dimensional and short time-course geneexpression datardquo IEEE Signal Processing Magazine vol 24 no1 pp 37ndash46 2007

[30] O Hirose R Yoshida S Imoto et al ldquoStatistical inferenceof transcriptional module-based gene networks from timecourse gene expression profiles by using state space modelsrdquoBioinformatics vol 24 no 7 pp 932ndash942 2008

[31] J Angus M Beal J Li C Rangel and D Wild ldquoInferringtranscriptional networks using prior biological knowledge andconstrained state-space modelsrdquo in Learning and Inference inComputational Systems Biology N Lawrence M GirolamiM Rattray and G Sanguinetti Eds pp 117ndash152 MIT PressCambridge UK 2010

[32] C Rangel J Angus Z Ghahramani et al ldquoModeling T-cell acti-vation using gene expression profiling and state-space modelsrdquoBioinformatics vol 20 no 9 pp 1361ndash1372 2004

[33] A Noor E SerpedinMN Nounou andHNNounou ldquoInfer-ring gene regulatory networks via nonlinear state-space modelsand exploiting sparsityrdquo IEEEACM Transactions on Computa-tional Biology and Bioinformatics vol 9 no 4 pp 1203ndash12112012

[34] Z Wang X Liu Y Liu J Liang and V Vinciotti ldquoAn extendedkalman filtering approach tomodeling nonlinear dynamic generegulatory networks via short gene expression time seriesrdquoIEEEACM Transactions on Computational Biology and Bioin-formatics vol 6 no 3 pp 410ndash419 2009

[35] A Noor E Serpedin M Nounou H Nounou N Mohamedand L Chouchane ldquoAn overview of the statistical methodsused for inferring gene regulatory networks and proteinproteininteraction networksrdquo Advances in Bioinformatics vol 2013Article ID 953814 12 pages 2013

[36] I Arasaratnam and S Haykin ldquoCubature kalman filtersrdquo IEEETransactions on Automatic Control vol 54 no 6 pp 1254ndash12692009

[37] A Noor E Serpedin M N Nounou and H N Nounou ldquoAcubature Kalman filter approach for inferring gene regulatorynetworks using time series datardquo in Proceedings of the IEEEInternationalWorkshop onGenomic Signal Processing and Statis-tics (GENSIPS rsquo11) pp 25ndash28 2011

[38] A Carmi P Gurfil and D Kanevsky ldquoMethods for sparsesignal recovery using kalman filtering with embedded rseudo-measurement norms and quasi-normsrdquo IEEE Transactions onSignal Processing vol 58 no 4 pp 2405ndash2409 2010

[39] C A Penfold andD LWild ldquoHow to infer gene networks fromexpression profiles revisitedrdquo Interface Focus pp 857ndash870 2011

[40] I Cantone L Marucci F Iorio et al ldquoA yeast synthetic networkfor in vivo assessment of reverse-engineering and modelingapproachesrdquo Cell vol 137 no 1 pp 172ndash181 2009

[41] Y Huang I M Tienda-Luna and Y Wang ldquoReverse engineer-ing gene regulatory networks a survey of statistical modelsrdquoIEEE Signal Processing Magazine vol 26 no 1 pp 76ndash97 2009

[42] Z Wang F Yang D W C Ho S Swift A Tucker and X LiuldquoStochastic dynamic modeling of short gene expression time-series datardquo IEEE Transactions on Nanobioscience vol 7 no 1pp 44ndash55 2008

[43] H Xiong and Y Choe ldquoStructural systems identification ofgenetic regulatory networksrdquo Bioinformatics vol 24 no 4 pp553ndash560 2008

[44] R Tibshirani ldquoRegression shrinkage and selection via the lassordquoJournal of the Royal Statistical Society B vol 58 no 1 pp 267ndash288 1996

[45] E J Cands andT Tao ldquoDecoding by linear programmingrdquo IEEETransactions on Information Theory vol 51 no 12 pp 4203ndash4215 2005

[46] J D Geeter H V Brussel and J D Schutter ldquoA smoothly con-strained Kalman filterrdquo IEEE Transactions on Pattern Analysisand Machine Intelligence vol 19 no 10 pp 1171ndash1177 1997

[47] S M Kay Fundamentals of Statistical Signal Processing Estima-tion Theory Prentice-Hall New York NY USA 1993

[48] httpwikic2b2columbiaedudream

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 10: Research Article Reverse Engineering Sparse Gene ...downloads.hindawi.com/journals/abi/2013/205763.pdf · compressed sensing-based Kalman lter is used for the estimation of system

10 Advances in Bioinformatics

gives advantages over EKF in terms of smaller run time forlarge networks The Cramer-Rao lower bound is also deter-mined for the parameters of the model and compared withthe MSE performance of the proposed algorithm Assess-ment using DREAM4 10-gene and 100-gene networks andIRMA network data corroborates the superior performanceof CKFS Future research directions include incorporatingthe estimation of model order in the network inferencealgorithm

Acknowledgments

This work was supported by USNational Science Foundation(NSF) Grant 0915444 and QNRF-NPRP Grant 09-874-3-235 The material in this paper was presented in part at theIEEE InternationalWorkshop on Genomic Signal Processingand Statistics (GENSIPS) San Antonio TX USA December2011

References

[1] X Zhou X Wang and E R Dougherty Genomic NetworksStatistical Inference from Microarray Data John Wiley amp SonsNew York NY USA 2006

[2] H Kitano ldquoComputational systems biologyrdquo Nature vol 420pp 206ndash210 2002

[3] X Zhou and S T CWongComputational Systems Bioinformat-ics World Scientific River Edge NJ USA 2008

[4] X Cai and X Wang ldquoStochastic modeling and simulation ofgene networksrdquo IEEE Signal Processing Magazine vol 24 no 1pp 27ndash36 2007

[5] D Yue J Meng M Lu C L P Chen M Guo and Y HuangldquoUnderstanding micro-RNA regulation a computational per-spectiverdquo IEEE Signal ProcessingMagazine vol 29 no 1 pp 77ndash88 2012

[6] R Pal S Bhattacharya and M U Caglar ldquoRobust approachesfor genetic regulatory network modeling and intervention areview of recent advancesrdquo IEEE Signal Processing Magazinevol 29 no 1 pp 66ndash76 2012

[7] H Hache H Lehrach and R Herwig ldquoReverse engineering ofgene regulatory networks a comparative studyrdquo Eurasip Journalon Bioinformatics and Systems Biology vol 2009 Article ID617281 2009

[8] T Schlitt and A Brazma ldquoCurrent approaches to gene regula-tory networkmodellingrdquo BMC Bioinformatics vol 8 no 6 p 92007

[9] H D Jong ldquoModeling and simulation of genetic regulatoy sys-tems a literature reviewrdquo Journal of Computational Biology vol9 no 1 pp 67ndash103 2002

[10] I Nachman A Regev and N Friedman ldquoInferring quantitativemodels of regulatory networks from expression datardquo Bioinfor-matics vol 20 no 1 pp i248ndashi256 2004

[11] C D Giurcaneanu I Tabus and J Astola ldquoClustering timeseries gene expression data based on sum-of-exponentials fit-tingrdquo EURASIP Journal on Advances in Signal Processing vol2005 no 8 Article ID 358568 pp 1159ndash1173 2005

[12] C D Giurcaneanu I Tabus J Astola J Ollila and M VihinenldquoFast iterative gene clustering based on information theoreticcriteria for selecting the cluster structurerdquo Journal of Computa-tional Biology vol 11 no 4 pp 660ndash682 2004

[13] X Cai andG B Giannakis ldquoIdentifying differentially expressedgenes in microarray experiments with model-based varianceestimationrdquo IEEE Transactions on Signal Processing vol 54 no6 pp 2418ndash2426 2006

[14] X Zhou XWang and E R Dougherty ldquoGene clustering basedon cluster-wide mutual informationrdquo Journal of ComputationalBiology vol 11 no 1 pp 151ndash165 2004

[15] W Zhao E Serpedin and E R Dougherty ldquoInferring connec-tivity of genetic regulatory networks using informationtheoreticcriteriardquo IEEEACMTransactions onComputational Biology andBioinformatics vol 5 no 2 pp 262ndash274 2008

[16] J Dougherty I Tabus and J Astola ldquoInference of gene reg-ulatory networks based on a universal minimum descriptionlengthrdquo Eurasip Journal on Bioinformatics and Systems Biologyvol 2008 Article ID 482090 2008

[17] L Qian H Wang and E R Dougherty ldquoInference of noisynonlinear differential equation models for gene regulatory net-works using genetic programming and Kalman filteringrdquo IEEETransactions on Signal Processing vol 56 no 7 pp 3327ndash33392008

[18] W Zhao E Serpedin and E R Dougherty ldquoInferring gene reg-ulatory networks from time series data using the minimumdescription length principlerdquo Bioinformatics vol 22 no 17 pp2129ndash2135 2006

[19] X Zhou X Wang R Pal I Ivanov M Bittner and E RDougherty ldquoA Bayesian connectivity-based approach to con-structing probabilistic gene regulatory networksrdquo Bioinformat-ics vol 20 no 17 pp 2918ndash2927 2004

[20] J Meng M Lu Y Chen S-J Gao and Y Huang ldquoRobust infer-ence of the context specific structure and temporal dynamics ofgene regulatory networkrdquo BMC Genomics vol 11 no 3 p S112010

[21] Y Zhang Z Deng H Jiang and P Jia ldquoInferring gene reg-ulatory networks from multiple data sources via a dynamicBayesian network with structural emrdquo in DILS S C Boulakiaand V Tannen Eds vol 4544 of Lecture Notes in ComputerScience pp 204ndash214 Springer New York NY USA 2007

[22] K Murphy and S Mian Modeling gene expression data usingdynamic Bayesian networks University of California BerkeleyCalif USA 2001

[23] H Liu D Yue L Zhang Y Chen S J Gao and Y Huang ldquoABayesian approach for identifying miRNA targets by combin-ing sequence prediction and gene expression profilingrdquo BMCGenomics vol 11 no 3 p S12 2010

[24] Y Huang J Wang J Zhang M Sanchez and Y WangldquoBayesian inference of genetic regulatory networks from timeseries microarray data using dynamic Bayesian networksrdquoJournal of Multimedia vol 2 no 3 pp 46ndash56 2007

[25] B-E Perrin L Ralaivola A Mazurie S Bottani J Mallet andF DrsquoAlche-Buc ldquoGene networks inference using dynamicBayesian networksrdquoBioinformatics vol 19 no 2 pp ii138ndashii1482003

[26] C Rangel D LWild F Falciani Z Ghahramani and A GaibaldquoAmodelling biological responses using gene expression profil-ing and linear dynamical systemsrdquo Bioinformatics pp 349ndash3562005

[27] MQuach N Brunel and F drsquoAlch Buc ldquoEstimating parametersand hidden variables in non-linear state-space models based onODEs for biological networks inferencerdquoBioinformatics vol 23no 23 pp 3209ndash3216 2007

[28] F-X Wu W-J Zhang and A J Kusalik ldquoModeling geneexpression from microarray expression data with state-space

Advances in Bioinformatics 11

equationsrdquo in Pacific Symposium on Biocomputing R B Alt-man A K Dunker L Hunter T A Jung and T E Klein Edspp 581ndash592 World Scientific River Edge NJ USA 2004

[29] R Yamaguchi S Yoshida S Imoto T Higuchi and S MiyanoldquoFinding module-based gene networks with state-spacemodelsmining high-dimensional and short time-course geneexpression datardquo IEEE Signal Processing Magazine vol 24 no1 pp 37ndash46 2007

[30] O Hirose R Yoshida S Imoto et al ldquoStatistical inferenceof transcriptional module-based gene networks from timecourse gene expression profiles by using state space modelsrdquoBioinformatics vol 24 no 7 pp 932ndash942 2008

[31] J Angus M Beal J Li C Rangel and D Wild ldquoInferringtranscriptional networks using prior biological knowledge andconstrained state-space modelsrdquo in Learning and Inference inComputational Systems Biology N Lawrence M GirolamiM Rattray and G Sanguinetti Eds pp 117ndash152 MIT PressCambridge UK 2010

[32] C Rangel J Angus Z Ghahramani et al ldquoModeling T-cell acti-vation using gene expression profiling and state-space modelsrdquoBioinformatics vol 20 no 9 pp 1361ndash1372 2004

[33] A Noor E SerpedinMN Nounou andHNNounou ldquoInfer-ring gene regulatory networks via nonlinear state-space modelsand exploiting sparsityrdquo IEEEACM Transactions on Computa-tional Biology and Bioinformatics vol 9 no 4 pp 1203ndash12112012

[34] Z Wang X Liu Y Liu J Liang and V Vinciotti ldquoAn extendedkalman filtering approach tomodeling nonlinear dynamic generegulatory networks via short gene expression time seriesrdquoIEEEACM Transactions on Computational Biology and Bioin-formatics vol 6 no 3 pp 410ndash419 2009

[35] A Noor E Serpedin M Nounou H Nounou N Mohamedand L Chouchane ldquoAn overview of the statistical methodsused for inferring gene regulatory networks and proteinproteininteraction networksrdquo Advances in Bioinformatics vol 2013Article ID 953814 12 pages 2013

[36] I Arasaratnam and S Haykin ldquoCubature kalman filtersrdquo IEEETransactions on Automatic Control vol 54 no 6 pp 1254ndash12692009

[37] A Noor E Serpedin M N Nounou and H N Nounou ldquoAcubature Kalman filter approach for inferring gene regulatorynetworks using time series datardquo in Proceedings of the IEEEInternationalWorkshop onGenomic Signal Processing and Statis-tics (GENSIPS rsquo11) pp 25ndash28 2011

[38] A Carmi P Gurfil and D Kanevsky ldquoMethods for sparsesignal recovery using kalman filtering with embedded rseudo-measurement norms and quasi-normsrdquo IEEE Transactions onSignal Processing vol 58 no 4 pp 2405ndash2409 2010

[39] C A Penfold andD LWild ldquoHow to infer gene networks fromexpression profiles revisitedrdquo Interface Focus pp 857ndash870 2011

[40] I Cantone L Marucci F Iorio et al ldquoA yeast synthetic networkfor in vivo assessment of reverse-engineering and modelingapproachesrdquo Cell vol 137 no 1 pp 172ndash181 2009

[41] Y Huang I M Tienda-Luna and Y Wang ldquoReverse engineer-ing gene regulatory networks a survey of statistical modelsrdquoIEEE Signal Processing Magazine vol 26 no 1 pp 76ndash97 2009

[42] Z Wang F Yang D W C Ho S Swift A Tucker and X LiuldquoStochastic dynamic modeling of short gene expression time-series datardquo IEEE Transactions on Nanobioscience vol 7 no 1pp 44ndash55 2008

[43] H Xiong and Y Choe ldquoStructural systems identification ofgenetic regulatory networksrdquo Bioinformatics vol 24 no 4 pp553ndash560 2008

[44] R Tibshirani ldquoRegression shrinkage and selection via the lassordquoJournal of the Royal Statistical Society B vol 58 no 1 pp 267ndash288 1996

[45] E J Cands andT Tao ldquoDecoding by linear programmingrdquo IEEETransactions on Information Theory vol 51 no 12 pp 4203ndash4215 2005

[46] J D Geeter H V Brussel and J D Schutter ldquoA smoothly con-strained Kalman filterrdquo IEEE Transactions on Pattern Analysisand Machine Intelligence vol 19 no 10 pp 1171ndash1177 1997

[47] S M Kay Fundamentals of Statistical Signal Processing Estima-tion Theory Prentice-Hall New York NY USA 1993

[48] httpwikic2b2columbiaedudream

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 11: Research Article Reverse Engineering Sparse Gene ...downloads.hindawi.com/journals/abi/2013/205763.pdf · compressed sensing-based Kalman lter is used for the estimation of system

Advances in Bioinformatics 11

equationsrdquo in Pacific Symposium on Biocomputing R B Alt-man A K Dunker L Hunter T A Jung and T E Klein Edspp 581ndash592 World Scientific River Edge NJ USA 2004

[29] R Yamaguchi S Yoshida S Imoto T Higuchi and S MiyanoldquoFinding module-based gene networks with state-spacemodelsmining high-dimensional and short time-course geneexpression datardquo IEEE Signal Processing Magazine vol 24 no1 pp 37ndash46 2007

[30] O Hirose R Yoshida S Imoto et al ldquoStatistical inferenceof transcriptional module-based gene networks from timecourse gene expression profiles by using state space modelsrdquoBioinformatics vol 24 no 7 pp 932ndash942 2008

[31] J Angus M Beal J Li C Rangel and D Wild ldquoInferringtranscriptional networks using prior biological knowledge andconstrained state-space modelsrdquo in Learning and Inference inComputational Systems Biology N Lawrence M GirolamiM Rattray and G Sanguinetti Eds pp 117ndash152 MIT PressCambridge UK 2010

[32] C Rangel J Angus Z Ghahramani et al ldquoModeling T-cell acti-vation using gene expression profiling and state-space modelsrdquoBioinformatics vol 20 no 9 pp 1361ndash1372 2004

[33] A Noor E SerpedinMN Nounou andHNNounou ldquoInfer-ring gene regulatory networks via nonlinear state-space modelsand exploiting sparsityrdquo IEEEACM Transactions on Computa-tional Biology and Bioinformatics vol 9 no 4 pp 1203ndash12112012

[34] Z Wang X Liu Y Liu J Liang and V Vinciotti ldquoAn extendedkalman filtering approach tomodeling nonlinear dynamic generegulatory networks via short gene expression time seriesrdquoIEEEACM Transactions on Computational Biology and Bioin-formatics vol 6 no 3 pp 410ndash419 2009

[35] A Noor E Serpedin M Nounou H Nounou N Mohamedand L Chouchane ldquoAn overview of the statistical methodsused for inferring gene regulatory networks and proteinproteininteraction networksrdquo Advances in Bioinformatics vol 2013Article ID 953814 12 pages 2013

[36] I Arasaratnam and S Haykin ldquoCubature kalman filtersrdquo IEEETransactions on Automatic Control vol 54 no 6 pp 1254ndash12692009

[37] A Noor E Serpedin M N Nounou and H N Nounou ldquoAcubature Kalman filter approach for inferring gene regulatorynetworks using time series datardquo in Proceedings of the IEEEInternationalWorkshop onGenomic Signal Processing and Statis-tics (GENSIPS rsquo11) pp 25ndash28 2011

[38] A Carmi P Gurfil and D Kanevsky ldquoMethods for sparsesignal recovery using kalman filtering with embedded rseudo-measurement norms and quasi-normsrdquo IEEE Transactions onSignal Processing vol 58 no 4 pp 2405ndash2409 2010

[39] C A Penfold andD LWild ldquoHow to infer gene networks fromexpression profiles revisitedrdquo Interface Focus pp 857ndash870 2011

[40] I Cantone L Marucci F Iorio et al ldquoA yeast synthetic networkfor in vivo assessment of reverse-engineering and modelingapproachesrdquo Cell vol 137 no 1 pp 172ndash181 2009

[41] Y Huang I M Tienda-Luna and Y Wang ldquoReverse engineer-ing gene regulatory networks a survey of statistical modelsrdquoIEEE Signal Processing Magazine vol 26 no 1 pp 76ndash97 2009

[42] Z Wang F Yang D W C Ho S Swift A Tucker and X LiuldquoStochastic dynamic modeling of short gene expression time-series datardquo IEEE Transactions on Nanobioscience vol 7 no 1pp 44ndash55 2008

[43] H Xiong and Y Choe ldquoStructural systems identification ofgenetic regulatory networksrdquo Bioinformatics vol 24 no 4 pp553ndash560 2008

[44] R Tibshirani ldquoRegression shrinkage and selection via the lassordquoJournal of the Royal Statistical Society B vol 58 no 1 pp 267ndash288 1996

[45] E J Cands andT Tao ldquoDecoding by linear programmingrdquo IEEETransactions on Information Theory vol 51 no 12 pp 4203ndash4215 2005

[46] J D Geeter H V Brussel and J D Schutter ldquoA smoothly con-strained Kalman filterrdquo IEEE Transactions on Pattern Analysisand Machine Intelligence vol 19 no 10 pp 1171ndash1177 1997

[47] S M Kay Fundamentals of Statistical Signal Processing Estima-tion Theory Prentice-Hall New York NY USA 1993

[48] httpwikic2b2columbiaedudream

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 12: Research Article Reverse Engineering Sparse Gene ...downloads.hindawi.com/journals/abi/2013/205763.pdf · compressed sensing-based Kalman lter is used for the estimation of system

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology