9
208 IEEE TRANSACTIONS ON NANOBIOSCIENCE, VOL. 3, NO. 3, SEPTEMBER 2004 Evolvable Social Agents for Bacterial Systems Modeling Ray Paton, Richard Gregory*, Costas Vlachos, Jon Saunders, and Henry Wu, Senior Member, IEEE Abstract—We present two approaches to the individual-based modeling (IbM) of bacterial ecologies and evolution using computational tools. The IbM approach is introduced, and its im- portant complementary role to biosystems modeling is discussed. A fine-grained model of bacterial evolution is then presented that is based on networks of interactivity between computational objects representing genes and proteins. This is followed by a coarser grained agent-based model, which is designed to explore the evolvability of adaptive behavioral strategies in artificial bacteria represented by learning classifier systems. The struc- ture and implementation of the two proposed individual-based bacterial models are discussed, and some results from simulation experiments are presented, illustrating their adaptive properties. Index Terms—Adaptive behavior, artificial ecologies, individual- based modeling (IbM), network-based models, rule-based models, virtual bacteria. I. INTRODUCTION T HE APPROACH to modeling biological systems that is presented in this paper includes evolutionary and ecolog- ical perspectives. Coupled with this is an approach from the “bottom-up” that seeks to model individual entities and pro- cesses as individuals rather than averaged aggregates. We shall explore some ways in which an individual-based approach, that seeks to include ecological and evolutionary dimensions, can be implemented by describing two systems we have developed that model (at different levels of biological granularity) bacte- rial systems in simple environments. Individual-based modeling (IbM) provides an important complementary approach to biosystems modeling that relies on population-based (averaging) techniques [1]. A number of groups have approached IbMs from a computational stance that makes use of such computational architectures as abstract Manuscript received January 29, 2004; revised April 9, 2004. This work was supported in part by the U.K. Engineering and Physical Sciences Research Council (EPSRC) under Grant GR/R16174/01 together with the U.K. Depart- ment of Trade and Industry (e-Science Support/ESNW) to grid-enable one simulation system (THBB/008/00 134C), in part by the MASA Group, in part by IBM Life Sciences, and in part by Esteem Computing. R. Paton, deceased, was with the BioComputing and Computational Biology Research Group, Department of Computer Science, University of Liverpool, Liverpool L69 7ZF, U.K., and also with the Los Alamos National Laboratory, Los Alamos, NM 87545 USA. *R. Gregory is with the BioComputing and Computational Biology Research Group, Department of Computer Science, University of Liverpool, Liverpool L69 7ZF, U.K. C. Vlachos is with the BioComputing and Computational Biology Research Group, Department of Computer Science, University of Liverpool, Liverpool L69 7ZF, U.K. J. Saunders is with the Microbiology and Genomics Division, School of Bi- ological Sciences, University of Liverpool, Liverpool L69 7ZB, U.K. H. Wu is with the Faculty of Engineering, Department of Electrical Engi- neering and Electronics, University of Liverpool, Liverpool L69 3GJ, U.K. Digital Object Identifier 10.1109/TNB.2004.833701 machines (automata), (quasi-)autonomous agents (distributed artificial intelligence (AI) systems) and ALife. For example, the ecological simulators Echo [2] and Herby [3] employed rule-based AI-inspired architectures to explore the evolvability of ecosystems. The systems we shall describe in subsequent sections provide environments in which artificial (virtual) bacteria are simula- tion models for selected aspects of real-world microbiological agents. The computational agents allow individual and adapt- able changes to be traced as their lineages change over time. It is possible to explore what happens in a system in which there are very large numbers of individuals interacting with each other and their environment over time and space. Energy and material are conserved in these systems, and information has an associ- ated cost with regard to its production, translocation, and trans- duction. With regard to all models, we acknowledge that there is a tradeoff between realism, precision, and generality. As such, any given architecture will emphasise some aspects of the real world and deemphasise others. We have explored a number of different agent architectures including a fine-grained model of a cell in which individual molecules or functional agencies are represented as software objects, and a coarser grained model that looks at the ecology, adaptability, and evolvability of bacte- rial behavioral strategies. However, there are serious caveats in these (as in any other) modeling systems. There is a danger in not only conflating material (matter), energy, and information in artificial (virtual) simulation systems, but also in creating things from nothing. This “economic” currency-balancing requirement is critical to any approach to systems biology that needs to deal with space and time, energy and material balances, and evolu- tionary processes. The reader may inquire about how the behavior of the system may be validated against real-world (e.g., experimental) approaches. It must be kept in mind that we are dealing with evolutionary timescales in these ecosystems. Experimental systems typically isolate variables (to gain control), simplify environments (as in the contrast between in vivo and in vitro experimental methods and outcomes), run for finite periods (i.e., the duration of the experiment rather than the lifetime of the system), alter the system being observed (e.g., through manipulation of parts and processes), and change so that certain states may no longer be reachable. The IbM approach we follow makes it possible to avoid these limitations. Information about all components of a system can be recorded, as well as relations between the components and changes in the relations. II. SYSTEMS BIOLOGY FOR EVOLVING BACTERIA Models such as the E-CELL project [4] aim to use gene data directly in a mathematical model of transcription. The Virtual 1536-1241/04$20.00 © 2004 IEEE

Evolvable Social Agents for Bacterial Systems Modeling

  • Upload
    h

  • View
    215

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Evolvable Social Agents for Bacterial Systems Modeling

208 IEEE TRANSACTIONS ON NANOBIOSCIENCE, VOL. 3, NO. 3, SEPTEMBER 2004

Evolvable Social Agents for BacterialSystems Modeling

Ray Paton, Richard Gregory*, Costas Vlachos, Jon Saunders, and Henry Wu, Senior Member, IEEE

Abstract—We present two approaches to the individual-basedmodeling (IbM) of bacterial ecologies and evolution usingcomputational tools. The IbM approach is introduced, and its im-portant complementary role to biosystems modeling is discussed.A fine-grained model of bacterial evolution is then presentedthat is based on networks of interactivity between computationalobjects representing genes and proteins. This is followed by acoarser grained agent-based model, which is designed to explorethe evolvability of adaptive behavioral strategies in artificialbacteria represented by learning classifier systems. The struc-ture and implementation of the two proposed individual-basedbacterial models are discussed, and some results from simulationexperiments are presented, illustrating their adaptive properties.

Index Terms—Adaptive behavior, artificial ecologies, individual-based modeling (IbM), network-based models, rule-based models,virtual bacteria.

I. INTRODUCTION

THE APPROACH to modeling biological systems that ispresented in this paper includes evolutionary and ecolog-

ical perspectives. Coupled with this is an approach from the“bottom-up” that seeks to model individual entities and pro-cesses as individuals rather than averaged aggregates. We shallexplore some ways in which an individual-based approach, thatseeks to include ecological and evolutionary dimensions, canbe implemented by describing two systems we have developedthat model (at different levels of biological granularity) bacte-rial systems in simple environments.

Individual-based modeling (IbM) provides an importantcomplementary approach to biosystems modeling that relieson population-based (averaging) techniques [1]. A number ofgroups have approached IbMs from a computational stancethat makes use of such computational architectures as abstract

Manuscript received January 29, 2004; revised April 9, 2004. This workwas supported in part by the U.K. Engineering and Physical Sciences ResearchCouncil (EPSRC) under Grant GR/R16174/01 together with the U.K. Depart-ment of Trade and Industry (e-Science Support/ESNW) to grid-enable onesimulation system (THBB/008/00 134C), in part by the MASA Group, in partby IBM Life Sciences, and in part by Esteem Computing.

R. Paton, deceased, was with the BioComputing and Computational BiologyResearch Group, Department of Computer Science, University of Liverpool,Liverpool L69 7ZF, U.K., and also with the Los Alamos National Laboratory,Los Alamos, NM 87545 USA.

*R. Gregory is with the BioComputing and Computational Biology ResearchGroup, Department of Computer Science, University of Liverpool, LiverpoolL69 7ZF, U.K.

C. Vlachos is with the BioComputing and Computational Biology ResearchGroup, Department of Computer Science, University of Liverpool, LiverpoolL69 7ZF, U.K.

J. Saunders is with the Microbiology and Genomics Division, School of Bi-ological Sciences, University of Liverpool, Liverpool L69 7ZB, U.K.

H. Wu is with the Faculty of Engineering, Department of Electrical Engi-neering and Electronics, University of Liverpool, Liverpool L69 3GJ, U.K.

Digital Object Identifier 10.1109/TNB.2004.833701

machines (automata), (quasi-)autonomous agents (distributedartificial intelligence (AI) systems) and ALife. For example,the ecological simulators Echo [2] and Herby [3] employedrule-based AI-inspired architectures to explore the evolvabilityof ecosystems.

The systems we shall describe in subsequent sections provideenvironments in which artificial (virtual) bacteria are simula-tion models for selected aspects of real-world microbiologicalagents. The computational agents allow individual and adapt-able changes to be traced as their lineages change over time. Itis possible to explore what happens in a system in which thereare very large numbers of individuals interacting with each otherand their environment over time and space. Energy and materialare conserved in these systems, and information has an associ-ated cost with regard to its production, translocation, and trans-duction. With regard to all models, we acknowledge that there isa tradeoff between realism, precision, and generality. As such,any given architecture will emphasise some aspects of the realworld and deemphasise others. We have explored a number ofdifferent agent architectures including a fine-grained model ofa cell in which individual molecules or functional agencies arerepresented as software objects, and a coarser grained modelthat looks at the ecology, adaptability, and evolvability of bacte-rial behavioral strategies. However, there are serious caveats inthese (as in any other) modeling systems. There is a danger innot only conflating material (matter), energy, and information inartificial (virtual) simulation systems, but also in creating thingsfrom nothing. This “economic” currency-balancing requirementis critical to any approach to systems biology that needs to dealwith space and time, energy and material balances, and evolu-tionary processes.

The reader may inquire about how the behavior of thesystem may be validated against real-world (e.g., experimental)approaches. It must be kept in mind that we are dealing withevolutionary timescales in these ecosystems. Experimentalsystems typically isolate variables (to gain control), simplifyenvironments (as in the contrast between in vivo and in vitroexperimental methods and outcomes), run for finite periods(i.e., the duration of the experiment rather than the lifetimeof the system), alter the system being observed (e.g., throughmanipulation of parts and processes), and change so that certainstates may no longer be reachable. The IbM approach wefollow makes it possible to avoid these limitations. Informationabout all components of a system can be recorded, as well asrelations between the components and changes in the relations.

II. SYSTEMS BIOLOGY FOR EVOLVING BACTERIA

Models such as the E-CELL project [4] aim to use gene datadirectly in a mathematical model of transcription. The Virtual

1536-1241/04$20.00 © 2004 IEEE

Page 2: Evolvable Social Agents for Bacterial Systems Modeling

PATON et al.: EVOLVABLE SOCIAL AGENTS FOR BACTERIAL SYSTEMS MODELING 209

Cell project [5], [6] makes use of user-defined protein reac-tions to simulate compartments at the nucleus and cellular level.GEPASI [7] also models protein reactions, but from within anenclosed box environment. The BacSim project [8], [9] simu-lates individual cell growth at the population scale. Eos [10] isalso based at the population scale, but is intended as a frameworkfor testing idealized ecologies, represented by evolutionary al-gorithms. These models rely on actual experimental data (whichcan be vague) and, more importantly, once input that data isstatic. As a result of this and the highly specific goal of themodels, it became clear that the required model of evolution didnot exist.

This section describes a model known as COmputing Sys-tems of Microbial Interactions and Communications (COSMICI) that is a careful balance of biological and computational re-alities [11] with an emphasis on open-endedness [12], individu-ality, and a parallel implementation. The main aim of COSMICI is evolutionary modeling based on biologically realistic or-ganisms, by including interactions within cells involving thecombined effects of enzymes and regulatory proteins acting ongenes, which in turn act on those enzymes and other proteins,creating a huge number of both positive and negative feedbackloops necessary for cellular control. All of the concepts beingimplemented use an IbM philosophy, the environment containsindividual cells, each cell contains an individual genome, andeach gene can lead to individual gene products, each with theirown spatial and temporal parameters.

Part of the central dogma of molecular biology is that thetranscription process is carried out by an enzyme specific forthe creation of mRNA species on a DNA template. This en-zyme (RNA polymerase) can be thought of as a complex pro-tein-based machine. To start transcription, an RNA polymerasemolecule binds at a promoter site which demarcates a startingpoint in that given area of the genome, the RNA polymerase thentranscribes as it moves along the genome, stopping at terminatorsites. An interesting feature of this system is variation in geneexpression and control. Control may be either transcription re-pressors or activators, and they may themselves be controlledby other regulatory proteins.

Operator regions form sites onto which repressor proteins(DNA-binding proteins) can attach. Once attached, the mRNAsencoded by that region are not transcribed and, hence, the pro-tein is not manufactured. If an inducer molecule is present, itcan bind to the repressor molecule and nullify the effect of therepressor. There are also corepressors whereby the repressorwill only bind to the operator region when it has already com-bined with another cellular component of the required type. Re-pressing or activating proteins reflect both the genotypic andenvironmental state of a bacterial cell, creating rapid complexprocessing.

A. Model Incorporating the Genome and the Proteome

There is a direct mapping of these biological concepts intoan object model that supports evolutionary processes. The keygoal is not accuracy (which is clearly impossible in an evolvingsystem that would need constant updating of parameters fromreal-world data that does not exist), but expressibility, whichamounts to a framework for evolution. Looking at genome

Fig. 1. Enzyme interaction types.

maps1 shows that the strings are divided into nonuniformlengths, each of these identifying some gene or other activestring sequence. Sequences can be broadly categorized intothose that encode an enzyme and sequences that act as reg-ulatory structures on which enzymes (or further nucleic acidsequences) act [13], [14]. As a result of this, the genome inCOSMIC I dynamically encodes eight broad sequence typesderived from those present on the lac and trp operons and al-lows for multiple interaction types of single genes. These eightsequence types are the DNA binding sites, promoters, operators,attenuators and receptors [including flagella activation proteinsequences (FAPs)], and the protein agents’ inducers, repressors,and RNA polymerase (whose specificity is determined by sigmafactors). Receptors provide the interface between the optionaltranscription cascade and the cell’s external environment. Ofthese eight, there are four fixed types, promoters, operators,attenuators, and a type that marks a gene as transcribable butwith as yet unknown purpose. Gene–gene and gene–enzymeinteractions then follow from pairing fixed types to these as yetunknown types that were originally defined to be only somekind of transcribable gene. These relationships are shown inFig. 1. Interaction pairing is based on the antimatch of theDNA-like encoded strings contained in each gene object and soresponds to the changing genome, but is also stable over timewhen the genome is stable.

COSMIC I realizes the genome and gene products using theIbM approach. This allows for the inclusion of spatial effects[15] without imposing artificial spatial boundaries of fixed res-olution, and also avoids modeling mean quantities. Given a pop-ulation of functional polypeptides in the cell, each has a chanceof reacting with each other and on the genome. The probabilityinvolved is based on their type as defined by the genome, theirposition, their half-life, and their age. Each reaction lasts for avariable time based on the type of gene pairing. The goal of thisprobability distribution is to be unbiased, giving each of theseattributes a chance to have some effect without deliberately al-lowing any single attribute to dominate. At this point, it must bepointed out that simplification of the real biological process has

1E.g., the E. coli database found at http://www.ecocyc.org/

Page 3: Evolvable Social Agents for Bacterial Systems Modeling

210 IEEE TRANSACTIONS ON NANOBIOSCIENCE, VOL. 3, NO. 3, SEPTEMBER 2004

Fig. 2. Conceptual view of a cell.

led to COSMIC I having no explicit molecule type on which en-zymes react. A single type of generic pseudoprotein is assumedto exist and all reactions act on that pseudoprotein. This pro-tein can be bound by enzymes, as well as the normal process ofenzymes acting on each other. The resulting effect is that the en-zymes and the single FAP type are not available to do anythingelse while bound, allowing another molecule a chance to bind.

The fully specified COSMIC I model [16], [17] consists ofa hierarchy of sets or objects. The most basic set is the gene,which is part of a genome set, which is part of a cell set as partof the environment. Each level contains additional parametersand attributes associated with that level; for instance, genes andgenomes contain spatial information that partly computes re-action probabilities. The cell set also contains the enzyme set,which is populated by gene products of the genome. A concep-tual view of the components in an individual cell is shown inFig. 2.

When specified for all genes, these relations between genesand gene products give a picture of which gene product (en-zyme) can react. This creates interaction paths between controlsequences and gene products, thus allowing these gene productsto then be called sigma factors, repressors, and inducers. Givena genome composed of genes, the gene products of each, anda set of relations describing what can happen, it is only neces-sary to take account of the set of reactions that are happening.This determination is based on the relations of possible reac-tions—each relation carries with it a pair of reaction instances.

COSMIC records in full detail the cascade of interactions be-tween gene products and genes. Fig. 3 shows a small part of theinteraction that occurred within cell 0143 over its lifetime. Themeaning of an interaction is based on the types of node sharinga common edge. Edges show relationships between these inter-actions, either transcriptional relations (adjacent genes) or con-trol relationships found in Fig. 1. These graphs then allow us

Fig. 3. Subset of network interactions over the lifetime of cell 0143. Diamondsrepresent glucose receptors; boxes represent outputs that lead to activation ofthe flagellum. Ovals represent all other types, namely, expressible genes. Insideeach node is the particular gene sequence given to that gene and the genetype in shortened form, the long form of which is given in Fig. 1. Edges showa relationship between the particular genes or gene products. This includesreflexive edges, which indicate an inhibited RNA polymerase. Figures on theedges show total usage counts for binding and unbinding reactions.

to examine which interaction paths were most heavily used, ameasure of the network depth.

B. The Environment

The largest scale in COSMIC I is the environment. An en-vironment is clearly needed to simulate bacteria by providingthe tool by which fitness can be implicitly defined. Fitness isdefined here as a measure of genome convergence toward theideal of following a food gradient. However, the changing na-ture of the environment ensures that there is no single ideal andthat any better solutions may become inferior over time.

The population of cells must compete side by side for whatexists in the common environment. To this end all cells live ina glucose-rich environment that is depleted over time by thecells [8]. Better cells in better regions of the environment willgrow faster and so multiply faster. The combination of input re-ward based on cell position and cell position based on flagellumoutput produces an indirect reward-based system that is the basison which the simulated bacteria evolve.

The view of the environment is presented in Fig. 4. Whiteequates to a glucose level of 4.5 mg, black equates to absence ofglucose. This area is 0.2 mm 0.2 mm. Black circles representbacteria that have not moved through lack of connection withtheir flagella, gray streaks show moving bacteria. Per cell glu-cose use has been exaggerated to better motivate evolutionarychange.

At this early stage in the simulation, the genomes still havelittle order, having grown for around nine generations there hasbeen little genetic mutation. As a result, the evolved behavior islimited to uncontrolled movement. The most motile cells leavea lighter trail of glucose use and can be picked out visually ascell 0193 on the top middle. Moving more slowly is cell 0246 onthe bottom left. Failed motility can be seen in cell 0252 on theleft middle. Fig. 4 also shows cells that have since been removedthrough death, leaving only the environmental effects.

Page 4: Evolvable Social Agents for Bacterial Systems Modeling

PATON et al.: EVOLVABLE SOCIAL AGENTS FOR BACTERIAL SYSTEMS MODELING 211

Fig. 4. Environmental change after 168 min.

It must be kept in mind that evolvability is the goal ofCOSMIC I, not prediction of known behavior. Briefly, wedescribe evolvability as the capacity of the population of agentsto undergo change in its heritable material that is transferredto the next generation. In the longer term, these changes re-late to genomes, whereas from one cell cycle to the next wealso include cell state. The generality needed for evolutionrequires the simulation set genetically determined parameters(e.g., transcription rates or reaction rates) as either random orunbiased. Verification then becomes a moot point. In the realbiological case, there is nothing to compare the system with.Some data is available on mutation rates, on genome sequencesafter mutation, and on phenotypes of mutants. This data isdisparate, not only in the sense of connecting it together in thereal-world case, but also in then relating it to the simulation.

To enable a reasonable computation time for a given evolu-tionary time frame, COSMIC I executes using a parallel archi-tecture [16] under the client–server paradigm. The client–serverparadigm is used at each coarse-grained time step, each stepbeing the chance to synchronize past events between server (theenvironment) and clients (cells). The link here is in biologicaltime rather than wall clock time and so ensures all cells executefor as long as they require for any one time step.

As should be expected, the slowly evolving populationmakes use of the dynamic genome size, duplication, and dele-tion events, creating a much more diverse genome size. For eachcell, genome size is initialized to within 70–120 genes (totalof both control genes and transcribable genes) with a meanof 95 genes. As can be seen in Fig. 5, the diversity increaseswithout any one size dominating, to end with a maximumgenome of 1000 genes in one individual cell, and a minimumof 36 genes (a limit imposed by the implementation). The finalmean of 160 suggests there is pressure for longer genomes,possibly caused by the relatively high mutation rate in genome

Fig. 5. Genome size distribution.

Fig. 6. Enzyme concentration distribution.

size. Here the bars show the individual cell minimum, mean,and maximum genome size over its lifetime.

In parallel with the genome size, the variance in gene prod-ucts per cell also increases over the course of the simulation.The high variance (bars representing minimum, mean, and max-imum enzyme concentration over the lifetime of the cell) shownin Fig. 6 suggests gene products are controlled by optional tran-scription. This is itself constrained during cell division where ahigh sensitivity to the division of gene products can massivelychange the production of further gene products. For new cellswith no parent, enzymes are initialized to have five enzymesper transcribable gene. This can only be seen early on whencells have a similar number of enzymes because they are beingcreated to maintain a population of 20 cells. This is also thecause of the noise in the mean number of enzymes and the largenumber of bars (cells) in the first 100 min. Two hundred minutesinto the simulation a viable and much more stable cell is foundthrough testing of random genomes. At this point, the diversityin enzyme populations increases through the majority of cells.This is to be expected when considering that the increasing ma-jority of these later cells are genetically related to the first stablecell (cell number 0143). The stable mean is then to be expected,with large variations caused by cell division and transcriptioncascade failure. At the end of the simulation, the mean enzymeconcentration over all cells was 10 000, the minimum was 0,and the maximum was 100 000 enzymes, presumably the resultof a futile cycle, but also ensuring that cell division would notbreak a vital pathway by removing a necessary enzyme. As a

Page 5: Evolvable Social Agents for Bacterial Systems Modeling

212 IEEE TRANSACTIONS ON NANOBIOSCIENCE, VOL. 3, NO. 3, SEPTEMBER 2004

result, these graphs manage to show that there is some kind ofevolutionary pressure pushing the cells to be more robust whensubjected to cell division.

We believe that COSMIC I represents one of the mostall-inclusive simulation systems of bacterial evolution yet avail-able, incorporating biological processes that normally exist inseparate models (i.e., transcription cascades, ecology of mul-tiple organisms, and evolution of these organisms). Includedare interactions within and between the genome and proteomefrom the bottom up, with no predefined transcription paths.Each participating object is not only encoded as a softwareobject, but it is also directly mappable to realistic proteinsand genes (though requiring some degree of simplificationregarding the relevant biophysical processes). Although lesscomplex than their biological analogues (that is, in terms oftypes), these collections enable a collective or systemic ob-servation as well as the individual (object) based observation.Above the viewpoint of the cell is the collection of cellsin the environment, which gives the most high-level view.Evolvability, not prediction, is the end goal and these differentviewpoints give COSMIC I scope to record and find the rootcauses, with then the possibility of inferring the same causesin the real bacterial analogue.

COSMIC I has been tested on a Linux farm and a 12-nodedual-processor Athlon XP 2000 Beowulf Cluster, both run-ning under a parallel virtual machine (PVM). On the latter, itproduced 3 GB per day of simulation data. A run of a weeksimulated approximately two days of bacterial life and tested upto 490 simultaneous cells. As we continue to develop COSMIC,we are also conscious that it will be necessary to explore greatercomputing power and simplify or abstract behaviors of the in-dividual agents. In the next section, we address one way thishas been addressed using an evolvable rule-based bacterial mod-eling method.

III. FROM SYSTEMS TO LOGICS—A RULE-BASED APPROACH

TO BACTERIAL EVOLVABILITY

This approach to bacterial modeling is different from the oneadopted in COSMIC I, in that the environment/organism inter-actions are modeled using a rule-based architecture. The derivedsystem is known as RUBAM, which stands for “RUle-basedBActerial Modeling.” RUBAM is not intended to be an accu-rate model of any specific real-world ecologies or bacteria, butrather, to capture salient features of their behaviors, in order tobe capable of exhibiting adaptive and evolvable patterns. Othermodels with similar features have been developed in the past(see, for example, Echo [2], [18], and Herby [3]). A brief de-scription of the structure and implementation of the proposedmodel is given below.

A. Structure of the RUBAM System

The RUBAM system consists of a number of fundamental el-ements, which work together to construct an artificial ecology.The most important of these are the artificial environment, apopulation of artificial organisms interacting with the environ-ment, and a set of evolutionary operators.

The artificial environment is represented by a -dimensionalgrid consisting of a number of uniformly distributed sites. Thisgrid acts as a “container” in which the artificial organisms areplaced to survive, interact, multiply, and evolve. In two dimen-sions, the grid forms a rectangle, while in three dimensions itforms a cuboid. Higher dimensions are also possible.

The environment contains a number of resources in variousconcentrations, arranged in a specific way that is a function ofour problem objectives. The resources provide the “energy” thatis necessary for the organisms to sustain life. The system isdesigned to conserve material and energy, which are conflatedfor the purpose of tractability. Most importantly, the resources(in certain combinations of concentrations) can trigger differenttypes of behavior in the organisms. This behavior can be de-structive to a given organism or it can generate a desired effect,in which case the organism gradually accumulates enough en-ergy to multiply and generate copies of itself in the environment.In this way, those organisms which are able to better exploitthe resources have a higher chance of propagating through tosuccessive generations and maximizing their presence in thepopulation.

B. Artificial Organisms

The artificial organisms are implemented using the conceptof a learning classifier system (LCS). More details on LCSsand their applications can be found in [19] and the referencestherein. LCSs are rule-based systems (also known as agentsor, in this case, virtual bacteria), which are able to receive“messages” coming from external sources (the environment orother organisms present in their neighborhoods), process themin some way, and generate appropriate actions that can affectthe organisms themselves, as well as the outside world. The“genetic material” of each organism consists of a collection ofrules that map its “inputs” (known as detectors) to its “outputs”(known as effectors). This set of rules determines the behaviorof the organism when subjected to different types of stimuli.The objective is to modify the rules in such a way that a desiredbehavior is obtained. In the context of evolution, “desiredbehavior” is one which maximizes the lifespan of an individualand guarantees reproductive success. An illustration of themost important elements of an artificial organism in RUBAMare shown in Fig. 7, and a brief description of each element isgiven below.

1) Detectors: The detectors enable the organism to receivemessages from the outside world. The messages come primarilyfrom resources that exist in the site that the organism currentlyresides. Furthermore, the organism is designed to be aware ofthe gradients of the resource surfaces in its neighborhood, andalso to be aware of its current energy level. By “awareness”we mean that the organism’s actions are not only functions ofthe resource concentrations, but also of their gradients and theorganism’s energy level. This has the potential of enabling theorganism to adopt different strategies as its energy level rises orfalls, and as the resource landscapes become steeper of flatter.

2) Rule-Based Classifier System: The messages received bythe detectors are sent to a rule-based classifier system (an LCS),where they are matched to the antecedents of the rules in the

Page 6: Evolvable Social Agents for Bacterial Systems Modeling

PATON et al.: EVOLVABLE SOCIAL AGENTS FOR BACTERIAL SYSTEMS MODELING 213

Fig. 7. Artificial (virtual) organism in the RUBAM system, with its mostimportant elements.

LCS’s rule base. This rule base constitutes the “genetic mate-rial” of the organism and directly affects its behavior in responseto those messages. Loosely speaking, the received messages arebeing classified in terms of their effect on the organism’s sur-vival. Based on this classification, the organism can then take ap-propriate actions by means of its effectors, in order to maximizeits lifespan and reproductive success. There are many ways inwhich the rule matching can be performed. In this work, a fuzzyclassifier system was used. This choice was based on the au-thors’ experimental results, which showed that a fuzzy inferencesystem generates more stable and well-behaved populations.

3) Energy Storage and State Manager: Each organism con-tains an energy reservoir, which accumulates energy from theresources and then uses it to sustain life by performing certaintasks. The level of energy stored inside an organism is boundin the interval and controls the ability of that organism toreceive messages and the way in which it responds to those mes-sages. This is the purpose of the state manager, which monitorsthe organism’s internal energy level and brings the organism toa number of different states. For example, if the energy level istoo low, the organism is unable to detect messages or performactions, and is said to be in the inactive state. Conversely, if theenergy level is sufficiently high, the organism is able to initiatethe division process, and may eventually divide into two copiesof itself. A small amount of mutation is applied to the LCS ofone of the two offspring, and this is the main mechanism forevolution.

4) Effectors: The effectors receive commands from theconsequents of the matched rules in the LCS’s rule base, andperform appropriate actions. Typical types of actions includemovement to a different neighboring site in the environmentalgrid, change of internal energy level (positive or negative), andresource generation.

Certain actions can directly affect the environment (i.e., re-source generation), while others can also modify the organismitself (i.e., movement, change of energy level). Resource gener-ation is an important action in that it enables the organisms to“communicate” with each other. This is possible because an or-ganism is potentially capable of receiving messages generatedfrom resources that another organism has produced. Evidence ofcoordinated behavior among certain types of bacteria has beenobserved in real ecological systems and is known as quorumsensing. This mechanism enables certain bacteria to regulatetheir population densities and actions by detecting the presence

Fig. 8. Artificial organism’s different states.

of other bacteria in their neighborhoods. The RUBAM systemhas the potential to evolve behavioral patterns similar to quorumsensing.

Evolution takes place by means of a set of evolutionaryoperators. These operators are stochastic in nature, and alterthe behavior of the organism in some way by modifying itsrules. There are several schemes that could be employed, withmutation (in various forms) being the fundamental evolutionaryoperator. The behavior of an organism can also be alteredby modifying the “importance” of each of the rules in thecorresponding LCS, based on their observed performance.A popular mechanism for doing this is the bucket-brigadealgorithm [20]. Other reinforcement learning algorithms [21]can also be employed.

Another important element in RUBAM is the choice of logicthat is used to map the messages received by the detectors toactions taken by the effectors. The proposed system is capable ofusing both traditional (Aristotelian) logic as well as fuzzy logic[22]. In this work, fuzzy logic was employed because it wasexperimentally found that it generally resulted in faster conver-gence to “well-behaved” rule bases containing fewer rules thanusing traditional logic. Furthermore, fuzzy logic generally re-sults in rule bases that are easier to comprehend and express innatural language. The inference method used to drive the effec-tors of the organisms was the one proposed in [23].

C. Different States of Organisms

Depending on its energy level, an organism may be in oneof four different states, which can affect the way in which itresponds to messages and performs actions. These are the inac-tive state, the active state, the division preparation state, and thedivision state.

The range of possible energy levels is divided into four re-gions, each one corresponding to one of the four states. Fig. 8illustrates the range of possible energy levels, partitioned intoregions of different states, showing typical energy level valuesfor the boundaries between states.

1) Inactive State: When the organism is in this state, itcannot receive messages from the environment and cannotperform actions. The organism enters this state when its energylevel has dropped below an inactivity threshold (in Fig. 8) and is not sufficient for the organism to functionnormally. When in this state, the organism is said to be inactive.

2) Active State: This is the state in which the organism is ex-pected to spend most of its lifetime. In this state, the organismhas sufficient energy to function normally (i.e., it is able to re-spond to messages and perform actions).

3) Division Preparation State: Once the energy level of anorganism exceeds a threshold , where ( inFig. 8), the organism enters the division preparation state. In thisstate, the division process is initiated, and the organism preparesto divide. This state is usually functionally similar to the active

Page 7: Evolvable Social Agents for Bacterial Systems Modeling

214 IEEE TRANSACTIONS ON NANOBIOSCIENCE, VOL. 3, NO. 3, SEPTEMBER 2004

Fig. 9. Division process and mutation operator.

state, although in some cases it may inhibit the organism fromperforming certain actions.

4) Division State: This is a transient (unstable) state. Oncethe energy level of an organism exceeds an upper limit , where

( in Fig. 8), the organism is able to successfullycomplete the division process, and is immediately split into twocopies of itself. Each of these copies contains approximatelyhalf the energy of the “parent” organism and, depending on thechoice of , , and , the two copies are quickly brought backto one of the previous states, most likely the active state. Thismeans that an organism usually stays in the division state foronly one time step.

D. Mutation Operator

Once an organism manages to accumulate enough energy toreach the division state, it immediately divides into two otherorganisms, which are identical copies of their “parent.” In orderto enable the organisms to evolve, a small amount of mutation isapplied to one of the two “offspring,” so that the original geneticmaterial is preserved, and at the same time new genetic materialis introduced in the population. The mutation operator is similarto that of standard genetic algorithms. Each of the symbols thatconstitute the rule base of the organism to be mutated is alteredwith a given mutation probability. The mutation probability isusually kept very small, and is often chosen as a function ofthe rule base size. Fig. 9 illustrates the division process and theapplication of the mutation operator to one of the offspring.

E. Simulation Results

Fig. 10 shows a snapshot of a typical RUBAM simulationrun, on a two-dimensional grid (i.e., a rectangle) consisting of100 100 sites, with an initial population size of 50 bacteria.Dark-to-light areas correspond to low-to-high concentrations ofresources. Organisms with moderate energy levels (in the ac-tive state) are marked with small circles, those with high energylevels (in the division preparation state) are marked with largecircles, and those in the inactive state are marked with crossedcircles. Two different resource types are shown in Fig. 10, withthe first one occupying the top left region of the environment(wide bell-shaped surface), and the second one occupying the

Fig. 10. Snapshot of the environment taken during a simulation run of theRUBAM system.

top right and bottom left regions (narrow bell-shaped surfaces).In this simulation, the organisms have evolved ways to “climb”the hills formed by the resources, in order to reach nutrient-richareas in the environment, which in turn enables them to growand reproduce. The paths followed by some of them can clearlybe observed in Fig. 10, leading to areas of high resource con-centrations. Upon examination of the rule bases of the fittestorganisms in the population, it was observed that the evolvedbehavior closely resembles that of a typical hill-climbing opti-mization algorithm, namely, the movement toward the directionof ascending gradients of the resource surfaces. This result, al-though relatively simple, demonstrates the ability of RUBAMto evolve behavioral patterns that are applicable to the solu-tion of real-world problems, such as function optimization prob-lems. Other RUBAM simulations have generated strategies thatcan help locate multiple optimal solutions in multimodal searchspaces. Furthermore, RUBAM allows the use of complex en-vironments of higher dimensions (i.e., ), thus enablingthe investigation of abstract ecologies that may not necessarilyexist in the natural world, but can be used to solve particularproblems.

It should be stressed at this point that RUBAM is neithera genetic algorithm (GA) approach [24], nor a genetic pro-gramming (GP) approach [25]. There is no external objectivefunction being optimized as such, but rather a fitness measurethat is internal to the individual organism. In other words, thefitness of an individual can only be assessed by subjecting itto an environment and letting it interact with it for some time(generally until its energy reserves are exhausted). In contrastto that, in many standard GA/GP approaches the fitness ofan individual can immediately be computed by means of aknown and static objective function. Initial results have shownthat RUBAM can generate solutions which are fundamentally(and favorably) different from traditional GA/GP approaches.

Page 8: Evolvable Social Agents for Bacterial Systems Modeling

PATON et al.: EVOLVABLE SOCIAL AGENTS FOR BACTERIAL SYSTEMS MODELING 215

A disadvantage of the proposed approach—as with most evo-lutionary approaches—is that it is computationally intensive.The possibility of implementing RUBAM in multiprocessorcomputer environments is currently being investigated.

IV. CONCLUSION

We have discussed IbM in highly computer-intensiveprojects concerned with modeling ecological and evolutionaryprocesses in artificial bacteria. The IbM approach has beenpresented as a way of carrying out virtual experiments thatare not possible in the laboratory. Our approach providesa biologically plausible method for exploring evolutionary“strategies” and how genomes might evolve. This is achievedwithout imposing anthropocentric constraints, but using bio-logically relevant input. We are not seeking to best-guess byputting all genetic inputs in at the beginning (that is, devel-oping a monolithic agent construction). A purely biologicalapproach would preclude nondestructive analysis and trackingof phenotypic behavior and genotypic change through multiplegenerations of a single-celled “organism” as is possible in oursimulations. In this virtual world, we can retain in computermemory the virtual genome of any single ancestral cell forreference purposes and to reseed further simulations. In the realworld, with a single-celled organism, this transient informationcannot be retained because the two daughter cells are producedat the “expense” of the parent. Moreover, the advantage of oursystems is that rates of evolutionary change can be acceleratedin silico beyond the maximum generation times possible witheven the fastest growing bacteria.

REFERENCES

[1] D. L. DeAngelis and L. J. Gross, Individual-Based Models and Ap-proaches in Ecology: Populations, Communities and Ecosystems. NewYork: Chapman & Hall, 1992.

[2] P. T. Hraber, T. Jones, and S. Forrest, “The ecology of echo,” Artif. Life,vol. 3, pp. 165–190, 1997.

[3] P. Devine and R. C. Paton, “Biologically inspired computational ecolo-gies: A case study,” in Lecture Notes in Computer Science, EvolutionaryComputing, D. Corne and J. L. Shapiro, Eds. Heidelberg, Germany:Springer-Verlag, 1997, vol. 1305, pp. 11–30.

[4] M. Tomita, K. Hashimoto, K. Takahashi, T. S. Shimizu, Y. Matsuzaki, F.Miyoshi, K. Saito, S. Tanida, K. Yugi, J. C. Venter, and C. A. HutchisonIII, “E-CELL: Software environment for whole-cell simulation,” Bioin-formatics, vol. 15, pp. 72–84, 1999.

[5] J. Schaff, C. C. Fink, B. Slepchenko, J. H. Carson, and L. M. Loew, “Ageneral computational framework for modeling cellular structure andfunction,” Biophys. J., vol. 73, pp. 1135–1146, 1997.

[6] J. Schaff and L. M. Loew, “The virtual cell,” in Proc. Pacific Symp. Bio-computing, vol. 4, 1999, pp. 228–239.

[7] P. Mendes, “GEPASI: A software package for modeling the dynamics,steady states and control of biochemical and other systems,” Comput.Appl. Biosci., vol. 9, pp. 563–571, 1993.

[8] J.-U. Kreft, G. Booth, and J. W. T. Wimpenny, “BacSim, a simulator forindividual-based modeling of bacterial colony growth,” Microbiology,vol. 144, pp. 3275–3287, 1998.

[9] J.-U. Kreft, C. Picioreanu, J. W. T. Wimpenny, and M. C. M. van Loos-drecht, “Individual-based modeling of biofilms,” Microbiology, vol. 147,pp. 2897–2912, 2001.

[10] E. Bonsma, M. Shackleton, and R. Shipman, “Eos—An evolutionaryand ecosystem research platform,” BT Technol. J., vol. 18, pp. 24–31,2000.

[11] E. C. Way, “The role of computation in modeling evolution,” Biosystems,vol. 60, pp. 85–94, 2001.

[12] G. Kampis, “Self-modifying systems: A model for the constructiveorigin of information,” Biosystems, vol. 38, pp. 119–125, 1996.

[13] M. T. Record, Jr., W. S. Reznikoff, M. L. Craig, K. L. McQuade, and P.J. Schlax et al., “Escherichia coli RNA polymerase (E� ), promoters,and the kinetics of the steps of transcription initiation,” in EscherichiaColi and Salmonella Typhimurium: Cellular and Molecular Biology,2nd ed, F. C. Neidhardt et al., Eds. Washington, DC: Amer. Soc. Mi-crobiol., 1996, pp. 792–821.

[14] J. Collado-Vides, R. M. Gutièrrez-Ríos, and G. Bel-Enguix, “Networksof transcriptional regulation encoded in a grammatical model,” Biosys-tems, vol. 47, pp. 103–118, 1998.

[15] D. RayChaudhuri, G. S. Gordon, and A. Wright, “Protein acrobatics andbacterial cell polarity,” Proc. Nat. Acad. Sci., vol. 98, pp. 1332–1334,2001.

[16] R. Gregory, R. C. Paton, J. R. Saunders, and Q. H. Wu, “Parallelisinga model of bacterial interaction and evolution,” Biosystems, to be pub-lished.

[17] , COSMIC: Computing systems of microbial interactions and com-munications. 1—Toward a modeling framework.

[18] J. H. Holland, Adaptation in Natural and Artificial Systems: An Intro-ductory Analysis With Applications to Biology, Control, and ArtificialIntelligence. Cambridge, MA: MIT Press, 1992.

[19] J. H. Holland, L. B. Booker, M. Colombetti, M. Dorigo, D. E. Goldberg,S. Forrest, R. L. Riolo, R. E. Smith, P. L. Lanzi, W. Stolzmann, and S. W.Wilson, “What is a learning classifier system?,” in Lecture Notes in Com-puter Science, Learning Classifier Systems, From Foundations to Appli-cations, P. L. Lanzi, W. Stolzmann, and S. W. Wilson, Eds. Heidelberg,Germany: Springer-Verlag, 2000, vol. 1813, pp. 3–32.

[20] J. H. Holland, “Properties of the bucket brigade algorithm,” in Proc. 1stInt. Conf. Genetic Algorithms and Their Applications (ICGA’85), J. J.Grefenstette, Ed., pp. 1–7.

[21] L. P. Kaelbling, M. L. Littman, and A. W. Moore, “Reinforcementlearning: A survey,” J. Artif. Intell. Res., vol. 4, pp. 237–285, 1996.

[22] L. A. Zadeh, “Fuzzy sets,” Inf. Contr., vol. 8, pp. 338–353, 1965.[23] E. H. Mamdani and S. Assilian, “An experiment in linguistic synthesis

with a fuzzy logic controller,” Int. J. Man-Mach. Stud., vol. 7, pp. 1–13,1975.

[24] D. E. Goldberg, Genetic Algorithms in Search, Optimization, and Ma-chine Learning. Reading, MA: Addison-Wesley, 1989.

[25] J. R. Koza, Genetic Programming: On the Programming of Computersby Means of Natural Selection. Cambridge, MA: MIT Press, 1992.

Ray Paton was a Reader in the Department ofComputer Science, University of Liverpool, Liv-erpool, U.K., where he headed the BioComputingand Computational Biology Research Group. Hewas also a Guest Faculty Member of the StatisticalSciences Group, Los Alamos National Labora-tory, Los Alamos, NM. He was instrumental inthe development of a number of multidisciplinaryinitiatives, including the international InformationProcessing in Cells and Tissues conferences and theU.K. Engineering and Physical Sciences Research

Council-funded CytoCom and MIPNETS networks. He was an editor ofBiosystems (Elsevier) and of a book series on Studies in Multidisciplinarity(Elsevier). His research interests included biologically inspired computation,theoretical and computational biology, and knowledge modeling. Dr. Patondied suddenly in July 2004.

Richard Gregory received the B.Sc. degree incomputer science and the Ph.D. degree from incomputer science, biology, and engineering from theUniversity of Liverpool, Liverpool, U.K., in 1999and 2004, respectively. His Ph.D. degree work ledto the ambitious project of microbial evolution in anindividual-based architecture which became knownas COSMIC.

He is currently involved with the BioComputingand Computational Biology Research Group, Depart-ment of Computer Science, University of Liverpool,

on the reengineering of the COSMIC project on which this paper is partly based.

Page 9: Evolvable Social Agents for Bacterial Systems Modeling

216 IEEE TRANSACTIONS ON NANOBIOSCIENCE, VOL. 3, NO. 3, SEPTEMBER 2004

Costas Vlachos received the B.Eng. degree in ap-plied electronics from the Technological EducationalInstitution, Athens, Greece, in 1993, and the M.Sc.and Ph.D. degrees in automatic control systems fromLiverpool John Moores University, Liverpool, U.K.,in 1995 and 2000, respectively.

From 1998 to 2001, he was with the ControlSystems Research Group, Liverpool John MooresUniversity, as a Research Assistant. He is currentlywith the BioComputing and Computational BiologyResearch Group, Department of Computer Science,

University of Liverpool, Liverpool, U.K. His research interests are in the areasof adaptive and nonlinear control systems theory, robust and optimal control,and global optimization methods with emphasis on evolutionary algorithms.

Jon Saunders received the B.Sc. degree in micro-biology and the Ph.D. degree in microbial geneticsfrom the University of Bristol, Bristol, U.K., in 1970and 1973, respectively.

After postdoctoral work in the Department ofBacteriology, University of Bristol, he was ap-pointed Lecturer in Microbiology at the Universityof Liverpool, Liverpool, U.K. He has held theChair of Microbiology since 1992 and is Dean ofScience (2000–2004) at the University of Liverpool.His research interests are in the areas of plasmids,

bacteriophages, and other mobile genetic elements in relation to bacterialpathogenesis, biogeochemical cycling, and gene transfer in the environment,both practically and in relation to biocomputing.

Henry Wu (M’91–SM’97) received the M.Sc. (Eng.)degree in electrical engineering from HuazhongUniversity of Science and Technology (HUST),Wuhan, China, in 1981 and the Ph.D. degree fromthe Queen’s University of Belfast (QUB), Belfast,U.K., in 1987.

From 1981 to 1984, he was Lecturer in ElectricalEngineering, Huazhong University. He worked as aResearch Fellow and Senior Research Fellow at QUBfrom 1987 to 1991 and Lecturer and Senior Lecturerin the Department of Mathematical Sciences, Lough-

borough University, Loughborough, U.K., from 1991 to 1995. Since 1995, hehas held the Chair of Electrical Engineering in the Department of Electrical En-gineering and Electronics, University of Liverpool, Liverpool, U.K., acting asthe Head of the Intelligence Engineering and Automation Group. His researchinterests include adaptive control, neural networks, learning systems, multia-gent systems, computational intelligence, biocomputation, and power systemscontrol and operation.

Prof. Wu is a Chartered Engineer and a Fellow of IEE.