SUMO - Supermodeling by combining imperfect models · WP7 year 3 report: Executive summary The objective of WP7 is to develop methods for computational scienti c dis-covery that can

SUMO - Supermodeling by combining

imperfect modelsYear 3 Report - Workpackage 7

Pan£e Panov, Nikola Simidjievski, Jovan Tanevski,

Darko Aleksovski, Aljaº Osojnik,

Ljup£o Todorovski, and Sa²o Dºeroski

May 14, 2014

1

2

WP7 year 3 report: Executive summary

The objective of WP7 is to develop methods for computational scienti�c dis-covery that can learn complete supermodels (ensembles of ODE models) ofdynamical systems. Its sub-objectives include the development of techniquesfor (semi)automated generation of constituent models (Task 7.1), selectionof an appropriate subset of models to include in the ensemble (Task 7.2),and selecting an appropriate method to combine the models and their pre-dictions (Task 7.3). The focus for WP7 in Year 3 was Task 7.3 (combiningthe model predictions), but this required polishing and revisiting certain as-pects of Tasks 7.1 and 7.2 (generating and selecting constituent models) andimplementing all of the above within an integrated software environment.

The constituent models (ODEs) are learnt from observed data and do-main knowledge by the process-based modeling tool ProBMoT. Note thatboth the structure and the parameters of the process-based models (and thecorresponding ODEs) are learnt. Thus, the ensembles can contain modelswith di�erent structure, which is a novelty as compared to ensembles whichare based on (random) variation of the initial conditions or parameter valuesof the ODE models.

ProBMoT searches a space of possible model structures, de�ned by thedomain knowledge given and �ts the parameter values to each structure con-sidered, based on the input data. The model structures considered are rankedin terms of their error, either on the training data set or on a separate vali-dation data set. The highest ranked model is the output of ProBMoT.

To generate (diverse) base-level models, we have extended ProBMoT tosupport the three steps outlined above. First, it allows for two types ofsampling of the (time-course) data given as input: bootstrap sampling anderror-weighted sampling. Second, it allows for two (independent) kinds ofselection of constituent models for the ensemble, one based on the error ontraining/validation data and the other on simulation stability: The latter isused for dynamic prediction-time selection of models to be used for predictionwithin the ensemble. Finally, it is possible to use three di�erent methodsfor combining the predictions of the constituent models, using three di�erenttypes of aggregation (taking a simple average or a weighted average/median).

We empirically evaluated the bagging and boosting approach to learningensembles of process-based (ODE) models on 15 di�erent data sets, comingfrom three aquatic ecosystems. In the evaluation we focus on the predic-tive power of the two types of ensembles and individual models. Note thatprevious approaches to computational scienti�c discovery have focused onexplanatory models of dynamics and their descriptive power and haven'tinvestigated the predictive power of such models.

3

The results of the empirical evaluation can be summarized as follows. Theensembles learned by bagging signi�cantly outperform individual models.The ensembles learned by bagging perform better than those learned byboosting, but not signi�cantly. Finally, the ensembles learned by boostingperform better than individual models, but not signi�cantly. Overall, theensembles learned by bagging perform best.

A longer summary of the above contributions can be found in Sections1, 2 and 3 of this report. A detailed description can be found in the papersassociated with this report. In particular, papers P1, P2, and P3 describethe above contributions in detail.

In addition to developing methods for learning ensembles of ODE models,we have applied these methods (as well as learning individual models) topractically relevant problems from environmental and life sciences. Theseinclude modeling phytoplankton dynamics in three lakes (Bled, Slovenia;Kasumigaura, Japan; and Zuerich, Switzerland) and modeling hydrologicaland nutrient leaching processes at watershed level (papers P3 and P4). Theyalso include an application in systems biology, i.e., modeling the process ofendocytosis, a crucial part of the immune response (papers P5, P6, and P7),as well as applications in the area of synthetic biology (talks IT1, IT2). Moredetails can be found in Section 4 and the associated papers.

Besides learning ensembles of (continuous-time) ODE models, we havealso addressed the task of learning ensembles of non-parametric regressionmodels for modeling dynamic systems in discrete time. We have used theexternal dynamics approach, which formulates this task as a regression taskof learning a di�erence/ recurrence equation/ model. This model predicts thestate of the system at a given discrete time point as a (nonlinear) functionof the states and inputs of the system at several previous time points.

In particular, we propose methods for learning fuzzy linear model treesand ensembles thereof. These are used as the function approximators withinthe external dynamics approach. Trees for predicting either a single-output(one state variable) or multiple-outputs (the entire system state) trees/ en-sembles can be learned by our methods.

We evaluate the proposed methods on a number of problems from thearea of control engineering. We show that they perform well and are resilientto noise. Among the di�erent approaches proposed, ensembles of trees forpredicting multiple outputs can be recommended, since they perform bestand yield the most succinct overall models. The above contributions aredescribed in detail in papers P8, P9, P10 and P11.

Finally, we also tackle the task of learning ensembles of non-parametricregression models for modeling dynamic systems in discrete time in a stream-ing context. The data stream paradigm matches the concept of modeling

4

dynamic systems very well, as the continued observation of the state andinputs of a dynamic systems over (discrete) time results in a (potentially in-�nite) stream of data. Within this paradigm, we can handle large quantitiesof data (both a large number of variables and a large/ in�nite number ofobservations), which makes it suitable for some novel application areas (suchas global systems science).

We have developed novel methods for learning ensembles of regressionand model trees on data streams, such as on-line bagging and on-line ran-dom forests. We have also proposed the approach of on-line learning of modeltrees with options, which present an e�ective compromise between learning asingle model and an ensemble. We have applied these approaches to severaltasks of discrete-time modeling of dynamic systems. The above contributionsare described in the papers P12 and P13.

Highlights

• We have developed equation discovery methods for learning ensemblemodels of dynamic systems, where both the structures of the constituentODE models and their parameters are learnt from observed data. Theensembles can contain models with di�erent structure, which is a nov-elty as compared to ensembles which are based on (random) variation ofthe initial conditions or parameter values of the ODE models. The en-sembles consist of ODE models learnt by using the bagging and boostingapproach to sampling the observed data and their predictions are com-bined by using simple averaging of the predicted trajectories.

We empirically evaluated the two approaches (bagging and boosting) on15 di�erent data sets, coming from three aquatic ecosystems, compar-ing them between themselves and to individual models. The ensembleslearned by bagging perform best and signi�cantly outperform individualmodels in terms of predictive power. Note that previous approaches tocomputational scienti�c discovery have focused on explanatory modelsof dynamics and their descriptive power and haven't investigated thepredictive power of such models.

• Besides learning ensembles of (continuous-time) ODE models, we havealso addressed the task of learning ensembles of non-parametric regres-sion models for modeling dynamic systems in discrete time. We haveused the external dynamics approach, which formulates this task as a re-gression task of learning a di�erence/ recurrence equation/ model. Weaddress this task both in the batch learning and on-line/ data streamcontext.

5

We have developed novel methods for learning ensembles of regressionand model trees on data streams, such as on-line bagging and on-linerandom forests. We have also proposed the approach of on-line learn-ing of model trees with options, which present an e�ective compromisebetween learning a single model and an ensemble. We have successfullyapplied these approaches to several tasks of discrete-time modeling ofdynamic systems.

List of papers associated with this deliverable

Conference and journal papers

P1 Nikola Simidjievski, Ljup£o Todorovski, Sa²o Dºeroski, Learning baggedmodels of dynamic systems, In: Zbornik, 5. ²tudentska konferencaMednarodne podiplomske ²ole Joºefa Stefana = 5th Joºef Stefan In-ternational Postgraduate School Students Conference, 23. maj 2013,Ljubljana, Slovenija, Nejc Trdin, ed., et al, Ljubljana, Mednarodnapodiplomska ²ola Joºefa Stefana, 2013, pp. 177�188.

P2 Nikola Simidjievski, Ljup£o Todorovski, Sa²o Dºeroski. Learning ensem-bles of population dynamics and their application to modelling aquaticecosystems. Submited to Ecological Modeling journal, 2014.

P3 Nikola Simidjievski, Ljup£o Todorovski, Sa²o Dºeroski. Bagging andboosting of process-based models. Submited to Machine Learning Jour-nal, 2014.

P4 Mateja �kerjanec, Nata²a Atanasova, Darko �erepnalkoski, Sa²oDºeroski, Boris Kompare. Development of a knowledge library for au-tomated watershed modeling, Environmental Modeling and Software,Vol. 54, pp. 60�72, 2014.

P5 Jovan Tanevski, Ljup£o Todorovski, Sa²o Dºeroski. Automated model-ing of Rab5Rab7 conversion in endocytosis, In: Zbornik, 5. ²tudentskakonferenca Mednarodne podiplomske ²ole Joºefa Stefana = 5th JoºefStefan International Postgraduate School Students Conference, 23. maj2013, Ljubljana, Slovenija, Nejc Trdin, ur., et al, Ljubljana, Mednaro-dna podiplomska ²ola Joºefa Stefana, 2013, pp. 209�218.

6

P6 Jovan Tanevski, Ljup£o Todorovski, Yannis Kalaidzidis, Sa²o Dºeroski.Inductive process modeling of Rab5-Rab7 conversion in endocytosis. In:Discovery science : 16th International Conference, DS 2013, Singapore,October 6-9, 2013, proceedings, (Lecture notes in computer science,Lecture notes in arti�cial intelligence), Johannes Fürnkranz, ed., et al,Berlin, Springer, 2013, vol. 8140, pp. 265�280, 2013.

P7 Jovan Tanevski, Ljup£o Todorovski, Yannis Kalaidzidis, Sa²o Dºeroski.Process-based modeling of endocytosis. Submitted to Information Sci-ences journal, 2014.

P8 Darko Aleksovski, Ju² Kocijan, Sa²o Dºeroski. Model tree ensembles formodeling dynamic systems. In: Discovery science : 16th InternationalConference, DS 2013, Singapore, October 6�9, 2013, proceedings, (Lec-ture notes in computer science, Lecture notes in arti�cial intelligence),Johannes Fürnkranz, ed., et al, Berlin, Springer, 2013, vol. 8140, pp.17-32, 2013.

P9 Darko Aleksovski, Ju² Kocijan, Sa²o Dºeroski. Model-Tree Ensemblesfor Noise-tolerant System Identi�cation. Submitted to Advanced En-gineering Informatics, 2014.

P10 Darko Aleksovski, Ju² Kocijan, Sa²o Dºeroski. Model Tree Ensemblesfor the Identi�cation of Multiple-Output Systems. Accepted for presen-tation at the Europeam Control Conference, Strasbourg, 2014.

P11 Darko Aleksovski, Ju² Kocijan, Sa²o Dºeroski. Ensembles of LinearModel Trees for the Identi�cation of Multiple-Output Systems. Sub-mitted to Information Sciences, 2014.

P12 Elena Ikonomovska, Joao Gama, Sa²o Dºeroski. Online Tree-based En-sembles and Option Trees for Regression on Evolving Data Streams.Accepted for publication in Neurocomputing journal, 2014.

P13 Aljaº Osojnik, Sa²o Dºeroski. Modeling Dynamical Systems with DataStream Mining. Submited to Neurocomputing journal, 2014.

Invited talks

IT1 Sa²o Dºeroski, Ljup£o Todorovski. Equation Discovery for Systems andSynthetic Biology. Invited talk at CMUSV Symposium on CognitiveSystems and Discovery Informatics, Carnegie Mellon University, 2013.http://www.cogsys.org/app/webroot/symposium/2013/abstracts.

html

7

IT2 Sa²o Dºeroski. Inductive Process Modeling for Learning the Dynamicsof Biological Systems. Invited talk at First International Conferenceon Formal Methods in Macro-Biology, Nourmea, New Caledonia, 2014.http://fmmb2014.sciencesconf.org/

Dissertations

D1 Elena Ikonomovska. Algorithms for learning regression trees and ensem-bles on evolving data streams. Doctoral dissertation, 2012.

D2 Darko �erepnalkoski. Proces-based models of dynamical systems: Rep-resentation and induction. Doctoral dissertation, 2013.

D3 Darko Aleksovski. Tree ensembles for discrete-time modeling of non-linear dynamic systems. Doctoral dissertation. Expected to be de-fended in 2014.

D4 Nikola Simidjievski. Learning ensembles of process-based models of dy-namic systems. Doctoral dissertation. Expected to be defended in2015.

D5 Jovan Tanevski. Deterministic and stochastic process-based modelingand design of dynamic systems in biology. Doctoral dissertation. Ex-pected to be defended in 2015.

8

Contents

1 Generating a diverse set of models 10

1.1 Bootstrap sampling . . . . . . . . . . . . . . . . . . . . . . . . 101.2 Error-weighted sampling . . . . . . . . . . . . . . . . . . . . . 111.3 Bagging and boosting of ODE models . . . . . . . . . . . . . . 11

1.3.1 Bagging of PBMs . . . . . . . . . . . . . . . . . . . . . 121.3.2 Boosting of PBMs . . . . . . . . . . . . . . . . . . . . 13

1.4 ProBMoT: A software tool for generating ensembles of processbased models . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

1.5 Empirical evaluation of bagging and boosting . . . . . . . . . 16

2 Selecting ensemble constituents 17

2.1 Selection at the level of the base algorithm . . . . . . . . . . . 172.2 Dynamic ensemble pruning at run time . . . . . . . . . . . . . 17

3 Combining model predictions 19

3.1 Simulation of individual ensemble models . . . . . . . . . . . . 193.2 Combining model simulations . . . . . . . . . . . . . . . . . . 20

4 Applications in real world domains 21

4.1 Applications in ecology . . . . . . . . . . . . . . . . . . . . . . 214.2 Applications in systems & synthetic biology . . . . . . . . . . 22

5 Discrete time modeling of dynamical systems with ensemble

methods 23

5.1 Model-tree ensembles for noise-tolerantsystem identi�cation . . . . . . . . . . . . . . . . . . . . . . . 23

5.2 Ensembles of linear model trees for theidenti�cation of multiple-output systems . . . . . . . . . . . . 24

6 Data streams and dynamical systems 25

6.1 Ensembles of data streams . . . . . . . . . . . . . . . . . . . . 256.2 Modeling dynamical systems with data streams . . . . . . . . 26

9

1 Generating a diverse set of models

Ensemble methods construct a set of predictive models in the learning phaseand then combine their predictions into a single one in the prediction phase.In the �rst three sections of the deliverable, we are going to present methodsfor learning ensembles of process-based models of dynamic systems. In this�rst section, we are going to present methods for generating a diverse setof candidate ensemble constituents. The next section presents methods forselecting the constituents to be included in the ensemble. In the third section,we are going to present methods for predicting with ensembles, i.e., combiningthe predictions of individual ensemble constituents into a single ensembleprediction.

Based on how the ensemble constituents are learned, the ensembles canbe homogeneous or heterogeneous. In homogeneous ensembles, the basemodels are learned using the same learning algorithm on di�erent sam-ples of the training data, where the sampling variants include: samplingof data instances, e.g., bagging [2] and boosting [6]; sampling of data fea-tures/attributes, e.g., random subspaces [8]; or both, e.g., random forests[1]. On the other hand, in heterogeneous ensembles, the candidate basemodels are learned using di�erent learning algorithms, e.g., stacking [10].

Here, we consider homogeneous ensembles, where we employ ProBMoT,an environment for simulating and learning process-based models, to learnindividual ensemble constituents. Each of these is learned on a di�erent sam-ple of the training data. We use two sampling techniques. The �rst is simplebootstrap sampling, where data samples for learning ensemble constituentsare independent; this is the technique used in bagging ensemble methods.The second technique is error-weighted sampling, where the probability forsampling a data point from the training set is proportional to the modelprediction error on that point. In such a way, ensemble constituents focuson improving erroneous predictions; this technique is often used in boosting.

1.1 Bootstrap sampling

The notable di�erence from the usual bootstrap sampling setting in machinelearning is that in our case the data instances have temporal ordering, thathas to be retained in each data sample. To this end, we implement samplingby retaining the order of the instances and introducing instance weights.They corresponds to the number of times the instance has been selected in

10

the process of sampling with replacement. Instances that have not beenselected remain in the sample with a zero weight.

To take into account the weights when learning a model from the sample,we use them to calculate the model prediction error, as follows:

WRMSE(m) =

√∑nt=0 ωt ∗ (yt − yt)2∑n

t=0 ωt

, (1)

where yt and yt correspond to the measured and simulated values (simulatingthe model m) of the system variable y at time point t, n denotes the numberof instances in the data sample, and ωt denotes the weight of the data instanceat time point t.

1.2 Error-weighted sampling

While in bootstrap sampling the data samples are random and mutuallyindependent, in error-weighted sampling we take into account the modelprediction error (of an incomplete ensemble). The weights for a particularsample are distributed among data points proportionally to the model er-ror on those points. More speci�cally, for each data sample the weights ofindividual data points are being updated using the formula

ωt ← ωt ·(

L

1− L

)1−Lt

(2)

where Lt is the model error at time point t, while L is the average model erroron the whole training set. Both are being estimated on the model learned inthe previous iteration. For the �rst learning iteration, all the weights are setto 1.

1.3 Bagging and boosting of ODE models

Bagging (Bootstrap aggregation) developed by Breiman [2], is one of the�rst and simplest ensemble learning methods. This method uses bootstrapsampling with aggregation. First, randomly sampled data instances, withreplacements, from the training set are used to obtain bootstrap replicates.Next, each base model is learned from a di�erent bootstrap replicate.

Boosting refers to a general approach for obtaining an accurate predictionby combining several rough ones, based on di�erent weights of examples fromthe training data set. The AdaBoost algorithm, proposed by [7], is an imple-mentation of the boosting approach for the task of classi�cation. AdaBoost

11

works iteratively; it uses di�erently weighted versions of the training datafor learning the base models at each iteration. Depending of the outcome ofthe past iteration this method decreases (for correct classi�cation)/increases(for incorrect classi�cation) the weight value of every instance for the sub-sequent iteration of training the model. This process can assure that theweak predictors can focus on di�erent instances, thus creating more robustones. In addition, the implementation of [4] successfully tackles the problemof combining regressors using AdaBoost.

In the following two subsection, we are going to introduce algorithms forbagging and boosting process-based models.

1.3.1 Bagging of PBMs

Algorithm 1 Bagging process-based models1: procedure Bagging(im, lib,DT , DV , k) returns Ensemble2: Ensemble← ∅ . set of base models3: let N . number of measurements in DT

4: for i = 1 to k do

5: ωt ← sample(DT ), t = 0..N . assign random weight to the datapoints

6: ω ← normalize(ω,N) . normalize to N7: modeli ← probmot(im, lib,DT , DV , ω)8: βi ←confidence(modeli, DV )9: Ensemble← Ensemble

⋃{(modeli, βi)}

10: end for

11: end procedure

12:

13: function confidence(model,D) returns β14: let yt . simulated system variable y in time point t15: let yt . measured system variable y in time point t16: let N . number of measurements in D17: yt ←simulate(model,D), t = 0..N

18: Lt ← |yt−yt|2(sup |yt−yt|)2 , t = 0..N. calculate square loss in each time point t

19: L←∑N

t=0 Lt . calculate average loss20: β ← L

1−L . calculate con�dence21: end function

Algorithm 1 provides an outline of our approach to bagging process-basedmodels. The procedure Bagging() takes �ve inputs: a conceptual model im,a library of domain knowledge lib, a training data set DT , a validation data

12

set DV , and an integer k that denotes the number of candidate ensemble con-stituents we are going to learn. The output is a set of process-based modelsEnsemble. Using a call to probmot() in line 7, we learn the candidate basemodels from di�erent samples of the training data. The standard input tothe probmot() procedure is a conceptual model, a library, and a trainingdata set. Additionally, here it takes two more inputs a validation data setDV and a set of weights ω. The �rst is used to estimate the performanceof the learned model on unseen data, while the second in�uences the modelerror estimate according to Equation 1.

The output of a modeling task, when using ProBMoT, is an ordered listof models, ranked according to their decreasing error on the training data setor the validation data set, if the later is available. The highest ranked modelis considered to be a candidate ensemble constituent and therefore added tothe output set Ensemble together with its con�dence β, calculated on thevalidation data set (DV ) by the call of confidence().

1.3.2 Boosting of PBMs

Algorithm 2 provides an outline of our approach to boosting process-basedmodels. As in the case of bagging, the Boosting() procedure takes thesame �ve inputs: a conceptual model im, a library of domain knowledge lib,training and validation data sets (DT and DV ), and an integer k denotingthe number of boosting iterations to be performed. In contrast to bagging,here we start with the complete training data set. In addition, we introducethe same concept of weighting, but instead of uniformly choosing randomtime points at the beginning, we start with setting all the weights to 1. Afterevery boosting iteration, the weights are recalculated (line 7 in Algorithm 2)according to the error of the ensemble model selected in the previous iterationusing the update formula presented in Equation 2.

The reweight() function implements the update formula and takes 3inputs: the ensemble constituent selected in the current boosting iteration(denoted with ensemble), a data set D, and the respective set of weights ω.First, the model is simulated on the data set D resulting in a trajectory y.Based on the model error at each time point in the trajectory and the set ofweights ω, an average loss L of the ensemble model is calculated (lines 18, 19in Algorithm 2). In turn, the loss is transformed to a con�dence measureβ, where low values of β denote high model con�dence. Finally, the set ofweights is updated: the smaller the loss, the more the weight is reduced,shifting the focus to more "di�cult" parts of the data set in the futureiterations of the boosting algorithm.

As in bagging, at each boosting iteration the highest ranked model in

13

Algorithm 2 Boosting process-based models1: procedure Boosting(im, lib, , DT , DV , k) returns Ensemble2: Ensemble← ∅ . set of base models3: let N . number of measurements in DT

4: ωt ← 1, t = 0..N . ωt is the weight of time point t5: for i = 1 to k do

6: modeli ← probmot(im, lib,DT , DV , ω)7: ω ← reweight(modeli, DT , ω)8: βi ←confidence(modeli, DV )9: Ensemble← Ensemble

⋃{(modeli, βi)}

10: end for

11: end procedure

12:

13: function reweight(model,D, ω) returns ω14: let yt . simulated system variable y in time point t15: let yt . measured system variable y in time point t16: let N . number of measurements in D17: yt ←simulate(model,D), t = 0..N

18: Lt ← |yt−yt|2(sup |yt−yt|)2 , t = 0..N. calculate square loss in each time point t

19: L←∑N

t=0 Lt ∗ ωt∑Nt=0 ωt

. calculate average loss ,according to theweights

20: β ← L1−L . calculate con�dence

21: ωt ← ωt ∗ β1−Lt , t = 0..N . update weights22: ω ← normalize(ω,N) . normalize to N23: end function

ProBMoT is considered to be a candidate ensemble constituent and is there-fore added to the output set Ensemble.

1.4 ProBMoT: A software tool for generating ensembles

of process based models

ProBMoT (which stands for Process-Based Modeling Tool, [9]) is a softwaretool for modeling, parameter estimation, and simulation of process-basedmodels. ProBMoT follows the process-based modeling paradigm: it employsdomain-speci�c modeling knowledge formulated in a form of a library of en-tity and process templates. These provide templates for modeling entitiesand processes (i.e., interactions between the entities) in the domain of inter-est.

14

Model structure generation

Library

Incompletemodel

Candidate model structures

Candidate models

Settings

Parameter estimationParameter estimationParameter estimationParameter estimation

MeasurementsOutput specification

f(x,y,z)

12

34

12

34

Figure 1: An overview of the ProBMoT procedure for learning process-basedmodels from knowledge and data.

Template entities provide general knowledge about the objects modeled,e.g., ranges of values for variables and constants described by the entities.Template processes provide general knowledge about the interactions mod-eled, e.g., pieces of equations that describe the rates of proceses and valueranges for rate constants.

A process-based model refers to these templates to specify the entities andprocesses for the observed system at hand. Where a human modeler cannotmake speci�c modeling decisions about the speci�c entities and processesof the observed system, she can specify a conceptual model that speci�es arange of process-based model structures. ProBMoT can then search throughthe whole range of candidate model structures and �nd the one that �ts theobserved data best.

Thus, when learning process-based models, ProBMoT takes a conceptualmodel of the observed system as input in addition to the library of domain-speci�c modeling knowledge (see Figure 1). The conceptual model speci�esthe expected logical structure of the expected model in terms of components(entities) and interactions (processes) between them without a speci�cationof the particular modeling choices. On the other hand, library of knowledgeprovides a set of candidate modeling choices at di�erent levels of abstrac-tion. ProBMoT combines the conceptual model with the library of modelingchoices to obtain a list of candidate model structures. For each model struc-ture, the parameter values are estimated to maximize the �t of the modelsimulation to the observed system behavior.

The parameter estimation process within ProBMoT is based on the meta-heuristic optimization framework jMetal 4.5 [5] that implements a numberof global optimization algorithms. In particular, ProBMoT employs the Dif-ferential Evolution algorithm to optimize the �t between simulations andobservations. To measure the �t, ProBMoT implements a variety of error

15

functions such as the sum of squared errors (SSE), mean squared error (MSE),root mean squared error (RMSE), root relative squared error (RRSE) andweighted root mean squared error (WRMSE).

To obtain model simulations, each process-based model is transformedto a system of ODEs. In turn, ProBMoT employs the CVODE solver ofordinary di�erential equations from the SUNDIALS suite [3]. Finally, theoutput of ProBMoT is a set of process-based models, sorted according to theestimated model error.

Two changes of the ProBMoT implementation were necessary to employit in the context of learning ensembles of process-based models. First, weimplemented a new measure of �t between observations and simulations thattakes into account weights of the data points assigned by the sampling proce-dures. To this end, we used the formula for the weighted root mean squarederror from Equation 1. Second, we allowed for estimation and ranking ofmodels based on their performance on unseen data, i.e., based on a separatevalidation set of observations.

1.5 Empirical evaluation of bagging and boosting

To test the utility of ensembles of process-based models, we performed anextensive empirical evaluation of the performance of ensembles and singlemodels. The results of the comparative analysis of the empirical evaluationshow the following:

• Ensembles of process-based models learned with the bagging algorithmsigni�cantly outperform single process-based models;

• Ensembles of process-based models learned with the bagging algorithmoutperform the ones learned with boosting, but the di�erence in per-formance is not signi�cant;

• Ensembles of process-based models learned with the boosting algorithmoutperform the single-process based models, but the di�erence in theperformance is not signi�cant.

16

2 Selecting ensemble constituents

2.1 Selection at the level of the base algorithm

At each iteration of building an ensemble, ProBMoT is used as a base-levelalgorithm for learning candidate ensemble constituents. In contrast with theusual machine learning setting for learning ensembles, where a base-level al-gorithm typically returns a single predictive model as a result of the learning,ProBMoT returns a ranked list of models. This gives us an opportunity toconsider di�erent models from this ranked list as ensemble constituents.

Based on results of a set of preliminary experiments, we decided to followthe traditional setting where only the top ranked model is selected to bean ensemble constituent. However, we use two di�erent ways to rank themodels within ProBMoT. The default ranking is based on the models' �t tothe training data set, while an alternative one is based on the models' �t to aseparate validation data set with observations not seen in the training phase.

The empirical comparison of the two selection techniques show that se-lection of the best model based on a separate validation data set outperformsthe default one. Using the �t to the training data to rank the models leads toover�tting. For details, see the attached articles Learning ensembles models

of population dynamics and its application to modeling aquatic ecosystems

and Bagging and boosting of process-based models.

2.2 Dynamic ensemble pruning at run time

In contrast with the traditional machine learning setting, where a modelgives predictions for each data instance separately and independently fromthe others, here, a single model simulation provides predictions for all the ob-served data time points at once. Simulating process-based models requires aminimal input for obtaining a whole trajectory: An initial (at time 0) valuefor the endogenous (system) variables and complete (at each time point)exogenous (forcing) variables. In a predictive scenario, where the model isbeing simulated on unseen data, this can often lead to divergent trajecto-ries and therefore substantial predictive error. For this reason, we examinethe simulated values and check whether they satisfy the upper- and lower-bound constraints speci�ed in the library of background knowledge. If thepredicted/simulated value is not within the speci�ed bounds, the whole tra-jectory of that particular model is discarded from the resulting ensembleprediction. Therefore, we select the models to be included in the ensemblesimulation (e.g., the base-level predictions to be combined into the ensembleprediction) at run time.

17

To illustrate how often we run into divergent simulation problems whensimulating process-based models on unseen data, we performed a simple ex-periment with a set of �fty process-based models. These are applied to 15di�erent datasets. We compare the results of two techniques for simulatingensembles (see Table 1). In the �rst, we combine the individual predictionsof all the models, not taking care about the constraints given in the libraryof domain knowledge (labeled complete). In the second case, we combineonly the predictions from the valid models in terms of the library (labeledpruned). Table 1 also reports the number of models discarded when pruningthe ensemble.

Table 1: Performance errors of a complete and a pruned ensemble and thenumber of base models pruned from the complete ensemble, with respect tothe �fteen test datasets.

DatasetEnsemble

Complete Pruned# models

pruned (out of 50)

B1 3.515E+03 1.104 25B2 1.235 1.235 0B3 1.073 1.045 5B4 0.737 0.737 0B5 0.604 0.604 0K1 2.387 0.946 10K2 4.498 1.840 48K3 6.631 0.916 7K4 0.772 0.997 4K5 0.962 0.962 0Z1 1.047 0.989 1Z2 1.323 1.096 3Z3 1.289 1.226 3Z4 0.976 0.976 0Z5 5.383 1.375 12

From the table, we can see that in all but one experimental dataset (K4),the pruned ensemble outperforms (or has performance that is equal to) thecomplete ensemble. Note also that, in several cases (B1, K1, K2, K3, Z5), theperformance of the ensemble is signi�cantly improved. By discarding modelswith divergent simulations from the ensemble, we can ensure valid ensemblepredictions and stable simulations.

18

3 Combining model predictions

3.1 Simulation of individual ensemble models

Algorithm 3 outlines the procedure for simulating a given ensemble of process-based models. In order to simulate an ensemble, each base model needs to besimulated. As discussed in the previous section, we apply run-time pruningof the ensemble based on the constraints on the simulated values speci�edin the library of domain knowledge. The resulting ensemble simulation is acombination of the predictions of the individual base models at the respectivetime points.

Algorithm 3 Simulating ensembles1: function simulateEnsemble(ensemble, lib,D, scheme) returns ye2: yv ← ∅ . yv denotes a valid model simulation3: βv ← ∅ . βv denotes a valid model con�dence4: ye ← ∅ . ye denotes the resulting ensemble simulation5: let N . length of the simulated trajectory6: for all (model, β) ∈ ensemble do7: y ← simulate(model,D)8: if inrange(∀x, lib), x ∈ y then9: {yv, βv} ← {yv, βv}

⋃{y, β}

10: else continue11: end if

12: end for

13: if scheme = average then14: ye ← average({yv})15: else if scheme = weightedAverage then16: ye ← sum({yv βv})17: else

18: ye ← weightedMedian({yv βv})19: end if

20: end function

The simulateEnsemble() procedure takes four inputs: a set of process-based models denoted with ensemble, the library of domain knowledge lib, adata set D, and a label scheme indicating which combination scheme shouldbe used. The resulting prediction of the ensemble is a trajectory denotedwith ye. To obtain it, each model from the set is simulated. The result of theprediction of a individual model for a dataset D is a trajectory y. Then, thealgorithm checks whether the simulated trajectory is within ranges speci�ed

19

in the modeling knowledge library and performs the run-time ensemble prun-ing described in the previous section. Finally, the simulations of individualmodels are being combined as described below.

3.2 Combining model simulations

For combining the predictions of the constituent ensemble models into theprediction of the ensemble, we use three aggregation function: i.e., average,weighted average and weighted median, commonly used for regression ensem-bles in traditional machine learning setting [4]. In the case of simple average,all base models predictions are equally weighted, while for the weighted av-erage, we use the models' con�dences β as weights. The same set of weightsis used when calculated the weighted median value. The empirical compar-ison of the three combining schemes shows that simple average outperformsother two schemes. Although the di�erence is not signi�cant, the simplicityof the average scheme renders it superior to the other two in the context ofensembles of process-based models.

20

4 Applications in real world domains

In addition to developing methods for learning ensembles of ODE models, wehave applied these methods (as well as learning individual models) to practi-cally relevant problems from environmental and life sciences. These includemodeling phytoplankton dynamics in three lakes (Bled, Slovenia; Kasumi-gaura, Japan; and Zuerich, Switzerland) and modeling hydrological and nu-trient leaching processes at watershed level. They also include an applicationin systems biology, i.e., modeling the process of endocytosis, a crucial part ofthe immune response, while previous applications of our methods addressedtasks from synthetic biology.

4.1 Applications in ecology

Modeling aquatic ecosystems. Ensemble methods are machine learningmethods that construct a set of models and combine their outputs into a sin-gle prediction. The models within an ensemble can have di�erent structureand parameters and make diverse predictions. Ensembles achieve high pre-dictive performance, bene�ting from the diversity of the individual modelsand outperforming them.

We developed approaches for learning ensemble models of dynamic sys-tems. We build upon the existing equation discovery systems for learningprocess-based models of dynamic systems from observational data, which in-tegrate the theoretical and empirical paradigms for modelling dynamic sys-tems: Besides observed data, they take into account domain knowledge.

We apply the proposed approach and evaluate its usefulness on a set ofproblems of modelling population dynamics in aquatic ecosystems. Data onthree lake ecosystems are used, we also use a library of process-based domainknowledge. The results clearly show that ensemble models perform signi�-cantly better than the single-model counterpart.

Watershed modeling of hydrology and nutrient leaching. In year 3of the SUMO project, we also developed a library of components for buildingsemi-distributed watershed models. The library incorporates basic model-ing knowledge that allows us to adequately model di�erent water �uxes andnutrient loadings on a watershed scale. It is written in a formalism com-pliant with the equation discovery tool ProBMoT, which can automaticallyconstruct watershed models from the components in the library, given a con-ceptual model speci�cation and measured data.

We applied the proposed modeling methodology to the Ribeira da Foupanacatchment to extract a set of viable hydrological models. By specifying

21

the conceptual model and using the knowledge library, di�erent hydrologicalmodels are generated. The models are automatically calibrated against mea-surements and the model with the lowest root mean squared error (RMSE)value is selected as an appropriate hydrological model for the selected studyarea.

4.2 Applications in systems & synthetic biology

Modeling endocytosis. Given its recent rapid development and the centralrole that modeling plays in the discipline, systems biology clearly needs meth-ods for automated modeling of dynamical systems. Process-based modeling isconcerned with explanatory models of dynamical systems: It constructs suchmodels from measured time-course data and formalized modeling knowledge.

In year 3 of the SUMO project, we applied PBM to the task of model-ing the conversion between Rab5 and Rab7 domain proteins in the cellularprocess of endocytosis, an example of a particularly relevant modeling taskin systems biology. The task is di�cult due to limited observability of thesystem variables (only sums thereof can be observed with the existing mea-surement methods) and noisy observations. This poses serious challenges tothe process of model selection. To address them, we proposed a task-speci�cmodel selection criterion, that takes into account knowledge about the nec-essary properties of the simulated model behavior. In a series of modelingexperiments, we empirically demonstrated the superiority of the proposed,task-speci�c criterion to the standard, general model selection approaches.

Automated design in synthetic biology. Equation discovery approachesinclude approaches for automated modelling of dynamic systems, which(re)construct models of the system dynamics from its observed behavior andavailable domain knowledge. Such approaches have recently been used insystems biology to reconstruct the structure and dynamics of biological net-works/circuits.

In year 3 of the SUMO project, we also explored the use of equationdiscovery approaches to construct/design biological circuits from constraintsthat describe their desired behavior. We �rst modi�ed the state-of-the-artautomated modeling approach to suit the requirements of the new task andthen applied the resulting approach to two synthetic biology tasks of biocir-cuit design.

22

5 Discrete time modeling of dynamical systems

with ensemble methods

Besides learning ensembles of (continuous-time) ODE models, we have alsoaddressed the task of learning ensembles of non-parametric regression modelsfor modeling dynamic systems in discrete time. We have used the externaldynamics approach, which formulates this task as a regression task of learninga di�erence/ recurrence equation/ model. This model predicts the state ofthe system at a given discrete time point as a (nonlinear) function of thestates and inputs of the system at several previous time points.

In particular, we propose methods for learning fuzzy linear model treesand ensembles thereof. These are used as the function approximators withinthe external dynamics approach. Trees for predicting either a single-output(one state variable) or multiple-outputs (the entire system state) trees/ en-sembles can be learned by our methods.

We evaluate the proposed methods on a number of problems from thearea of control engineering. We show that they perform well and are resilientto noise. Among the di�erent approaches proposed, ensembles of trees forpredicting multiple outputs can be recommended, since they perform bestand yield the most succinct overall models. The above contributions aredescribed in more detail in the following subsections.

5.1 Model-tree ensembles for noise-tolerant

system identi�cation

In year 3 of the SUMO project, we addressed the task of identi�cation ofnonlinear dynamic systems from measured data. The discrete-time variantof this task is commonly reformulated as a regression problem. As treeensembles have proven to be a successful predictive modeling approach, weinvestigated the use of tree ensembles for solving the regression problem.

While di�erent variants of tree ensembles have been proposed and used,they are mostly limited to using regression trees as base models. We intro-duce ensembles of fuzzi�ed model trees with split attribute randomizationand evaluated them for nonlinear dynamic system identi�cation. Models ofdynamic systems built for control purposes are usually evaluated by a morestringent evaluation procedure using output, i.e., simulation error. Takingthis into account, we perform ensemble pruning to optimize the output errorof the tree ensemble models.

The proposed Model-Tree Ensemble method was empirically evaluatedby using input-output data disturbed by noise. It was compared to represen-

23

tative state-of-the-art approaches, on one synthetic dataset with arti�ciallyintroduced noise and one real-world noisy dataset. The evaluation showedthat the method is suitable for modeling dynamic systems and producedmodels with comparable output error performance to the other approaches.Also, the method is resilient to noise, as its performance does not deteriorateeven when up to 20% of noise is added.

5.2 Ensembles of linear model trees for the

identi�cation of multiple-output systems

In year 3 of the SUMO project, we also addressed the task of discrete-timemodeling of nonlinear dynamic systems with multiple outputs using measureddata. In the area of control engineering, this task is typically converted intoa set of classical regression problems, one for each output, which can thenbe solved with any nonlinear regression approach. Fuzzy linear model trees(known as Takagi-Sugeno models) are a popular approach in this context.

We proposed, implemented and empirically evaluated three extensionsof fuzzy linear model trees. First, we consider multi-output Takagi-Sugenomodels. Second, we proposed to use ensembles of such models. Third, we in-vestigated the use of a search heuristic based on simulation error (as opposedto the one-step-ahead prediction error), speci�c to the context of modelingdynamic systems.

We empirically evaluated and compared these approaches on three casestudies, including the modeling of the inverse dynamics of a robot arm andtwo process-industry systems. Multi-output model trees exhibit comparableperformance to a set of single-output models, while providing a more compactmodel. Ensembles improve the performance of both single and multi-outputtrees, while the heuristic speci�c to modeling dynamic systems only improvesperformance very slightly. Overall, we can recommend the use of multi-output trees and ensembles thereof learnt by using prediction error as asearch heuristic.

24

6 Data streams and dynamical systems

In year 3, we also tackled the task of learning ensembles of non-parametricregression models for modeling dynamic systems in discrete time in a stream-ing context. The data stream paradigm matches the concept of modelingdynamic systems very well, as the continued observation of the state andinputs of a dynamic systems over (discrete) time results in a (potentially in-�nite) stream of data. Within this paradigm, we can handle large quantitiesof data (both a large number of variables and a large/ in�nite number ofobservations), which makes it suitable for some novel application areas (suchas global systems science).

We have developed novel methods for learning ensembles of regression andmodel trees on data streams, such as on-line bagging and on-line randomforests. We have also proposed the approach of on-line learning of modeltrees with options, which present an e�ective compromise between learning asingle model and an ensemble. We have applied these approaches to severaltasks of discrete-time modeling of dynamic systems. The above contributionsare described in more details in the following subsections.

6.1 Ensembles of data streams

The emergence of ubiquitous sources of streaming data has given rise to thepopularity of algorithms for online machine learning. In that context, Ho-e�ding trees represent the state-of-the-art algorithms for online classi�cation.Their popularity stems in large part from their ability to process large quan-tities of data with a speed that goes beyond the processing power of anyother streaming or batch learning algorithm.

Hoe�ding trees have often been used as base models of many ensemblelearning algorithms for online classi�cation. However, despite the existenceof many algorithms for online classi�cation, ensemble learning algorithmsfor online regression do not exist. In particular, the �eld of online any-timeregression analysis seems to have experienced a serious lack of attention.

In year 3 of the SUMO project, we also addressed this issue through astudy and an empirical evaluation of a set of online algorithms for regression,which included the baseline Hoe�ding-based regression trees, online optiontrees, and an online least mean squares �lter. We also designed, implementedand evaluated two novel ensemble learning methods for online regression: on-line bagging with Hoe�ding-based model trees, and an online RandomForestmethod in which we have used a randomized version of the online modeltree learning algorithm as a basic building block. Finally, in this context weevaluated the proposed algorithms along several dimensions: predictive ac-

25

curacy and quality of models, time and memory requirements, bias-varianceand bias-variance-covariance decomposition of the error, and responsivenessto concept drift.

6.2 Modeling dynamical systems with data streams

In year 3, we also addressed the task of modeling dynamical systems in dis-crete time using regression trees, model trees and option trees for on-lineregression. There exist several challenges that modeling dynamical systemspose to data mining approaches: these motivate the use of methods for min-ing data streams. We proposed the FIMT-DD algorithm for mining datastreams with regression or model trees, as well as the FIMT-DD based algo-rithm ORTO, which learns option trees for regression. These methods werethen compared on several case studies, i.e., tasks of learning models of dy-namical systems from observed data. From the performed experiments wecan conclude that option trees for regression work best among the consideredapproaches for learning models of dynamical systems from streaming data.

26

References

[1] Leo Breiman. Classi�cation and regression trees. The Wadsworth andBrooks-Cole statistics-probability series. Chapman & Hall, 1984.

[2] Leo Breiman. Bagging predictors. Machine Learning, 24(2):123�140,1996a.

[3] Scott D. Cohen and Alan C. Hindmarsh. Cvode, a sti�/nonsti� odesolver in c. Comput. Phys., 10(2):138�143, March 1996.

[4] Harris Drucker. Improving regressors using boosting techniques. In Pro-ceedings of the Fourteenth International Conference on Machine Learn-ing, ICML '97, pages 107�115, San Francisco, CA, USA, 1997. MorganKaufmann Publishers Inc.

[5] Juan J. Durillo and Antonio J. Nebro. jmetal: A java frameworkfor multi-objective optimization. Advances in Engineering Software,42:760�771, 2011.

[6] Yoav Freund. An adaptive version of the boost by majority algorithm. InProceedings of the twelfth annual conference on Computational learningtheory, COLT '99, pages 102�113, New York, NY, USA, 1999. ACM.

[7] Yoav Freund and Robert E Schapire. A decision-theoretic generalizationof on-line learning and an application to boosting. Journal of Computerand System Sciences, 55(1):119 � 139, 1997.

[8] Tin Kam Ho. The random subspace method for constructing decisionforests. Pattern Analysis and Machine Intelligence, IEEE Transactionson, 20(8):832�844, Aug 1998.

[9] Darko �erepnalkoski, Katerina Ta²kova, Ljup£o Todorovski, Nata²aAtanasova, and Sa²o Dºeroski. The in�uence of parameter �tting meth-ods on model structure selection in automated modeling of aquaticecosystems. Ecological Modelling, 245(0):136 � 165, 2012. 7th Euro-pean Conference on Ecological Modelling (ECEM).

[10] David H. Wolpert. Stacked generalization. Neural Networks, 5:241�259,1992.

27

Appendix: Workpackage 7, its structure, and

summary of work in year 2

WP7 Objectives. The objective of WP7 is to develop methods for compu-tational scienti�c discovery that can learn complete supermodels (ensemblesof ODE models) of dynamical systems. The sub-objectives of this WP includethe development of techniques for (semi)automated generation of constituentmodels, selection of an appropriate subset of models, and learning the formand coe�cients of the interconnections among the models.

WP7 Description of work. WP7 will develop methods for learning com-plete supermodels. The supermodels are expected to be built in three phases:�rst generate diverse models, then select a set of complementary models, and�nally learn the interconnections between the constituent models of an en-semble. These three phases form the three tasks that constitute WP7:

• Task 7.1 Generate diverse set of ODE models. To generate adiverse set of models, in this task, we adapt existing approaches fromthe area of ensemble learning. These include taking di�erent subsam-ples of the data, taking projections of the data, and taking di�erentlearning algorithms (or randomized algorithms). Di�erent subsets ofdomain knowledge may also be considered.

• Task 7.2 Select a complementary set of ODE models. Givena set of models, in this task, we use a measure of similarity betweenmodels to select models that are complementary. Di�erent measuresof similarity (or model performance/quality) can be considered. Be-sides the sum of squared errors and correlation, these can include theweighted sum of squared errors or robust statistical estimators.

• Task 7.3 Learn to interconnect ODE models. In learning theinterconnections between the constituent models of the ensemble, inthis task, we consider searching through the space of possible struc-tural forms of the interconnections, coupled with parameter �tting fora selected functional form of the possible connections. For parameter�tting, we will use global optimization methods based on meta-heuristicapproaches. The use of such parameter estimation methods is of crucialimportance in supporting the use of di�erent quality criteria, as wellas avoiding local optima in search.

28

Deliverables in WP7. The progress of each task from WP7 is reported inthe planned deliverables as follows:

• D7.1 Report on the generation of a diverse set of ODE models.

This deliverable adresses task 7.1 and presents the progress on the taskof generating a diverse set of ODE models. In the Annex 1 (Descriptionof work) this deliverable was planned for month 18 of the duration ofthe project.

• D7.2 Report on the selection of a complementary set of ODE

models. This deliverable adresses task 7.2 and presents the progresson the task of selecting a complementary set of ODE models. In theAnnex 1 (Description of work), this deliverable was planned for month27 of the duration of the project.

• D7.3 Report on learning to interconnect ODE models. Thisdeliverable adresses task 7.1 and presents the progress on the task oflearning to interconnect ODE models. In the Annex 1 (Description ofwork), this deliverable was planned for month 36 of the duration of theproject.

Summary of activities from year 2. The objective of WP7 is to de-velop methods for computational scienti�c discovery that can learn completesupermodels (ensembles of ODE models) of dynamical systems. The su-permodels are expected to be built in three phases: generate diverse models,select a set of complementary models, and learn the interconnections betweenthe constituent models of an ensemble. While this decomposition allows forindependent development of the di�erent components of the approaches tolearning ensembles of ODE models, we can evaluate their performance onlywhen we integrate them into complete approaches to learning ensembles.Thus, the issues related to �nding working combinations of the methods de-veloped and proposed within D7.1 and D7.2 in year 2 are still open and areto be resolved within the task T7.3.

In Task 7.1 from year 2, we were focused on generating a diverse setof ODE models. To generate a diverse set of models, we adapted existingsampling approaches from the area of ensemble learning. The speci�citiesof the task of subsampling an observed behavior of a dynamic system wastaken into account. After di�erent subsamples are generated, the base learnerProBMoT was applied to learn di�erent ODE models from them.

The approaches adapted included subsampling the instance space andsubsampling the feature space. Along the �rst dimension, we consideredthe selection of random sub-intervals of the observation period. We adapted

29

bootstrap sampling and error-weighted sampling for the case of time-seriesdata. Finally, we performed preliminary experiments (with some of theseadaptations) with learning ODE models (same structure, di�erent parame-ters) and included it in the preliminary deliverable targeting Task 7.3 in year2.

Along the second dimension, we considered random sub-sampling of thevariable space, as in random subspaces. We considered combining it withbootstrap sampling of the instance space. Moreover, we proposed a gen-eralized sub-sampling approach, which simultaneously subsamples both theinstance and the feature space. Finally, we also suggested considering aprocedure for sampling the template entities and processes from the libraryof background knowledge used by ProBMoT as a way to generate a set ofdiverse ODE models.

In Task 7.2 from year 2, we focused on selecting a set of complementaryODE models. To address this task, we �rst attempted to de�ne the notionof diversity for ODE models. While model diversity has been extensivelystudied in the context of ensemble models, the bulk of work has consideredthe task of classi�cation. For regression, co-variance is used as a measureof similarity/diversity of constituent models of an ensemble. To the best ofour knowledge, the notion of diversity has not been studied in the context ofensembles of ODE models.

We based our de�nition of diversity on similarity measures between dy-namic system behaviors, i.e., multi-variate time series. These are based onsimilarity measures for single time-series. The similarities of correspondingtime-series pairs in two behaviors are aggregated into an overall similarity.We considered a range of similarity measures for time-series, which we havecollected and implemented. We explored their use in clustering of ODEmodels and observed behaviours of dynamic systems. These clustering ex-periments are important in the context of further work within Task 7.3, wherea set of appropriate models has to be chosen for inclusion in the ensemble.After clustering a larger set of models, representative models from each clus-ter could be chosen for combination, for example.

30

Documents

SUMO - Supermodeling by combining imperfect models · WP7 year 3 report: Executive summary The objective of WP7 is to develop methods for computational scienti c dis-covery that can