A Framework for Megascale Agent Based Model Simulations on ...pages.mtu.edu/~rmdsouza/Papers/2007/Agent_framework.pdf · A Framework for Megascale Agent Based Model Simulations on

A Framework for Megascale Agent Based ModelSimulations on the GPU

Mikola Lysenko a,∗, Roshan D’Souza b, Keyvan Rahmani b,aMichigan Technological University, Department of Computer Science, 1400 Townsend

Drive, Houghton, MI, USAbMichigan Technological University, Department of Mechanical Engineering and

Computer Science, 1400 Townsend Drive, Houghton, MI, USA

Abstract

This paper presents a series of efficient, data parallel algorithms for simulating agentbased models. These include methods for handling environment updates, agent interactionsand replication. One of the most important techniques presented in this work is a novelstochastic allocator which enables parallel agent replication in O(1) average time. Thesetechniques can be easily implemented on a modern day graphics processing unit (GPU)resulting in a substantial performance increase. We believe that our system is the first evercompletely GPU based agent simulation framework.

Key words: GPGPU, Agent Based Modeling, Data Parallel Algorithms, StochasticAllocation

1 Introduction

Agent based modeling is a technique which is becoming increasingly popular fordescribing complex natural phenomena such as crowd and swarm behavior. Anagent based model (ABM) describes a system by representing it as a collection ofcommunicating, concurrent objects. While each individual object may have a fi-nite internal state, the combined interaction of all entities can create rich emergentbehaviors. These techniques are crucial within the social sciences, where they arebeing used to develop a rigorous mathematical basis for understanding collective

∗ Corresponding Author.Email addresses: [email protected] (Mikola Lysenko), [email protected]

(Roshan D’Souza), [email protected] (Keyvan Rahmani).

Preprint submitted to Parallel and Distributed Computing 30 November 2007

behaviors. Unfortunately, present methods for simulating ABMs are severely lim-ited in terms of their capacity. This is a consequence of the inherently serial natureof typical agent based simulation frameworks.

To simulate ABMs, several software frameworks have been developed that enableeasy implementation and experimentation. So far, these frameworks include desk-top CPU based simulations and parallel cluster computing methods. Most ABMplatforms follow a library based paradigm, where a host programming language isextended through generic procedures to facilitate ABM development. In this stan-dard approach, the library includes a framework (a set of concepts for designingABMs) along with tools for simulation and visualization. The first of these wasSwarm[1], written in Objective-C. Repast[2,3] began as a Java implementation ofSwarm but diverged significantly. Most recently, MASON[4] is being developed asa new Java platform with emphasis on parallel and distributed execution. The Logofamily of platforms takes a more radical approach[5,6], defining a completely newprogramming language specifically for ABMs. The result is a high-level platformthat enables users of all skills to rapidly prototype ABMs. The latest incarnation ofthe Logo family, NetLogo[7], possesses many sophisticated capabilities includingbehaviors, agent lists, and graphical interfaces. Currently, NetLogo is one of themost widely used platforms for ABM simulation.

Unfortunately, all of the frameworks described above are severely restricted by per-formance, which in turn limits the scale of the models which may be studied. Thisis a big problem because the system level behaviors of an ABM are likely to changesubstantially with population size. Relevant simulations should have agent popula-tions matching those of real dynamic systems. For example, New York City hasa population of about nine million [8], yet current desktop ABM frameworks canhandle at most a population size of several thousands [9]. Significant improvementsin performance are necessary.

Previous research focused on using clusters to overcome these performance prob-lems [10–13]. Cluster based computing is effective only if the processors spend themajority of the time computing as opposed to communicating [11]. This is not easywith ABMs, since they typically exhibit high levels of interconnectivity. To lowercommunication costs, researchers have devised various schemes such as spheres ofinfluence [14], federations [15], strong groups [16], and event horizons [11]. Allof these attempt to distribute individual agent objects between different processorsbased on estimates of probable communications, either a-priori or dynamically.The idea behind these methods is that it is better to group agents which frequentlycommunicate with each other on one processor to minimize inter-processor com-munication. Unfortunately, predicting the communication patterns within an ABMis at least as difficult as actually simulating the system. As a consequence, costlysynchronization is inevitable [16].

In this paper, we will show how large scale ABMs can be implemented efficiently

2

+ + =

Fig. 1. Various environmental parameters, such as obstacles or scalar fields, are stored inthe components of the color channels.

on the GPU. Our methods achieve a speed up factor of over 9,000 1 . This is due to acombination of asymptotic algorithmic improvements and the massive parallelismof the GPU. The methods proposed in this work are essentially data parallel, andcan be adapted to work on other computing platforms such as FPGAs. The issuesaddressed in this research include modeling of mobile agents, updating grid basedenvironments, inter-agent communication, agent replication and visualization. Ourframework is composed of three essential parts. First, we describe a system forhandling environments or static agents. Next, we show how to handle the interac-tions of spatial or mobile agents which move independent of the background gridstructures. Finally, we discuss the user interface components and visualization.

2 Environment

Environments, or static agents, form the background for common agent simula-tions. Typically, they are stored as a lattice of values representing ambient quantitiessuch as chemical concentration, heat or geographic structures. The type of valuesand the structure of the grid is determined by the agent based model. For discretesystems, cellular automata (CA) are often used [18], while other systems may useconvolution functions, such as the Laplacian for diffusion [19]. These types of envi-ronment agents are distinctly well suited for data parallel execution, and have beenextensively covered within existing literature [20–23], so we will discuss them onlybriefly.

On the GPU, textures are ideal for representing the kinds of environments used incommon agent based models. Numerical quantities are packed into the red, green,blue and alpha channels within each texture element as illustrated in Fig. 1. Formore complex environments, additional channels can be layered using the extracolor attachments within each frame buffer object. Updating the environment isaccomplished using the classic ping-pong technique[24]. The basic idea behindping-ponging is to iteratively update some texture by repeatedly passing it betweenframe buffers using a fragment shader. Fig. 2 illustrates this process. In the firstpass, the contents of buffer A are rendered into buffer B using a fragment shader.

1 Compared to EcoLab’s StupidBugs v16, grid size 2048x2048, 2 million agents [17].EcoLab evaluates 1 update per 200 seconds, while our GPU implementation achieves morethan 50 updates per second.

3

B

A

A

B

...Fig. 2. The ping-pong technique forms the basis for most GPGPU processing. Images areiteratively updated as they rendered from one frame buffer to the next.

In the next pass, B is rendered into A and so on. In each pass, the state of the envi-ronment is updated a complete timestep. For the purposes of this paper, the texturecontaining the environment data will be known as the world texture. Indices intothis texture are known as world coordinates.

These basic ideas have long been used to directly implement cellular automata onthe GPU [22]. Harris et al developed the concept of coupled lattice maps as anextension of CAs for the purposes of simulating clouds and natural phenomena[21]. Recently, Perumalla and Banya extended some of these simple techniquesto include stochastic cellular automata[25]. Convolution operations have alreadybeen heavily researched in the context of image processing[23]. Broadly speaking,there are three methods for evaluating a convolution operation using the GPU. Thesimplest method is direct convolution, which is optimal for small filter sizes (4x4or less). For larger convolutions, there are two basic approaches. The first onlyworks for what is known as a separable convolution. Separable convolutions canbe expressed as the product of two smaller convolutions. The result is that they canbe executed by simply running one convolution, then the other. For non-separableconvolutions, the fast Fourier transform is best. Several GPU libraries exist forperforming this operation[26,27].

A final technical concern when evaluating environment updates is handling bound-ary conditions. Toroidal boundaries can be implemented on the GPU, using texturewrapping. For arbitrary boundaries, a much better strategy is to store the boundaryas a mask within a separate texture. Then, at the start of each environment updatepass, this bitmask is rendered into the depth buffer. This exploits the early z-cullfeature in graphics hardware and greatly reduces the amount of fragment shadercomputation [28].

4

struct AGENT {

};

float x;

float y;

float health;...

...

Fig. 3. An encoding of the agent state into the agent state texture.

3 Mobile Agents

Environments by themselves are not sufficient for a complete agent based modelingtoolkit. Moreover, simulating mobile agents on the GPU is substantially more com-plicated than fixed position environments. The primary difficulty lies in connectingthe agents to the external environment and their neighbors. Broadly, there are threebasic tasks which must be performed, ideally entirely within the GPU:

(1) Store the mobile agent state

(2) Update the mobile agent state

(3) Connect the mobile agents to the environment

To solve the first issue, we adopt the standard GPGPU technique of encoding theagent information into a texture known as the state texture. Under this scheme,each mobile agent’s state is packed into the texels’ RGBA color values as shown inFig. 3. These color values are then interpreted as state variables, such as locationor size. If the agent state can not be squeezed into 4 floating point values, then itis possible to add additional color buffers. A side effect of this encoding scheme isthat each agent is automatically given a unique identifier based on its coordinateswithin the state texture, this identifier is known as the agent ID.

The second issue, updating the mobile agents, is implemented with ping-ponging.Each individual type of behavior, such as movement, reproduction and communi-cation must be handled separately. For randomized behavior, such as movement orsearching, it may be necessary to implement a per agent random number generator.Doing so, can be effectively achieved using the techniques presented by Sussmanet al[29]. In the following sections, each of these issues will be discussed individu-ally greater detail. Some of these interactions may require multiple iterations of the

5

(5,5) (1,3)

(11,6)(7,10)

Collision Map

State Texture

Fig. 4. Scattering the agents

ping ponging technique in order to resolve complex behaviors. But first we shallexamine how the mobile agents are connected spatially to the environment.

The third issue, connecting mobile agents to the environment, is performed dif-ferently for each direction. For the mobile agents, reading from the environmentcan be performed using a double look up into the world texture using their posi-tion. Connecting the environment back to the agents is more difficult, and requiresthe use of a scatter operation. Scattering on GPUs has been described in severalprevious works on GPGPU programming[30]. There are two primary methods forachieving this operation using shaders. The first is to use the render to vertex buffertechnique to draw the desired positions for the agents into a buffer, then use thesepositions to draw point primitives within world texture. The second option is to usethe vertex processor’s read from texture feature to directly locate the agents withinthe environment. Either way, agent scattering is a costly operation and it is desir-able to minimize its cost. To do this, the agent positions are typically scattered onlyonce into a separate buffer known as the collision map, as shown in Fig. 3. Thesize of the collision map is the same as the size of the environment texture. Eachfragment within the collision map stores the agent ID of the agent at that position,or a null value if it is empty. Using this data, all spatial queries can be translateddirectly into texture reads from the collision map.

The overall structure of the system is depicted in Fig. 5. The loop shown in theright side of the figure shows the update kernel for the agent state texture. Theframe buffer objects, aFBO Front and aFBO Back, are used to store the state texturefor ping-ponging. On the left part of the figure, the environment is updated usinga symmetric ping-ponging operation. The center of the figure shows how thesecomponents are connected through the agent scatter kernel.

6

Fig. 5. A complete overview of the framework.

4 Execution Order and Collisions

One obstacle to parallel execution of current agent-based models is that many ofthem specify an exact execution order. Doing so is at odds with the very nature ofparallelism, and if taken to the logical extreme it becomes impossible to achieveany parallel speed up. To overcome this problem, it is best to split the simulationinto several phases, each of which updates some subset of the agents in an arbitraryorder. Each of these phases is then implemented using a separate rendering pass.However, at times a finer grained level of ordering is absolutely required. A patho-logical example is the StupidBugs model, which requires that larger bugs preemptsmaller bugs[31]. To guarantee this, we need some assurance of atomicity withinthe operations. This is accomplished on the GPU through the use of the z-cullingfeature, though on other architectures an atomic compare-and-set instruction wouldwork equally well.

One way to intuitively think of this scheme is to imagine that each possible ac-tion is associated with a unique agent. Let the set of all possible actions be T ={T1, T2, ...Tm}, and the set of all agents be A = {a1, a2, ...an}. Each agent ai ∈ Ais also associated with a unique event ti ∈ T . To handle priority, each agent’sevent, ti, is assigned a priority pi ∈ [0, 1]. Agent/event pairs with higher prioritytake precedence over those with lower priority. For a given time step, the goal of thealgorithm is to determine which agents have the highest priority for a given action.

7

The mapping from actions to agents is given by ID : T → A and from actions topriorities P : T → [0, 1], both of which may be represented as arrays of size m.Once ID is computed, an agent ai may take action ti if and only if ID[ti] = ai. Tocompute the contents of ID, each agent uses an atomic operation to simultaneouslyset their priority for action. Agents which successfully wrote a value may then act.The entire process is summarized in Alg. 1.

Algorithm 1 Fine grained prioritized action algorithm1: P []← 02: ID[]← NULL3: for all ai in Agents do4: begin atomic5: if P [ti] <= pi then6: P [ti] = pi

7: ID[ti] = ai

8: end if9: end atomic

10: end for11: for all ai in Agents do12: if ID[ti] = ai then13: Perform action ti14: end if15: end for

As an illustrative example, consider the StupidBugs movement problem. When thevertex processor scatters the bugs, the value of the depth coordinate is set to the sizeof the bug. As a result, if two bugs move to the same space within the collision map,then the larger bug will take the spot over the smaller bug. This process is repeatedseveral times until each bug is in a valid square. Of course, this may still fail if aftereach iteration the bug is preempted, so the process may need to be repeated. In theworst case, there are potentially 80 agents that may move into a given cell withinthe stupid bug model, due to the 9x9 movement radius. However, a more practicalnumber of passes is typically 5, which seems to work successfully in all observedinstances. The basic ideas in this technique form the basis for updating the collisionmap within spatial models.

5 Death and Reproduction

Agent death and reproduction are an important features in population models. Aclassic example from the social sciences is Epstein and Axtell’s SugarScape[32]model, which uses agent reproduction and inheritance to simulate the developmentof a society. Dynamic populations also play a central role in ecological simulations,such as predator-prey models. Finally, natural selection within evolutionary modelsis impossible without some level of death and rebirth.

8

Fig. 6. A simplified view of the agent state texture. Black cells are empty (dead), and whitecells are filled (alive). Cells with a red dot are about to replicate (gravid).

Compared to reproduction, handling death on the GPU is relatively simple. Eachagent fragment is given a flag to determine if it is alive or dead. Often this may becombined with other state variables through bitwise manipulation. If the agent isdetermined to have died on a given update, then this flag is turned off and conse-quently the agent dies. When the agent state texture is updated, all agents markedas dead are simply ignored, and dead agents are not scattered by linking processes.

Agent birth is substantially more complicated. The main problem lies in allocatingnew agents. GPUs are fundamentally parallel machines, and allocation is typicallyviewed as a strictly serial operation. Furthermore, allocation must be efficient, sincemany agents will be giving birth each update. In previous work on GPU particlesystems, Kolb et.al. used a CPU allocator to handle the creation of new particleson the GPU[33]. However, this scheme is not suitable for agent systems, since thenumber of allocations is far too large. The simple act of transferring the agent statetexture to the CPU and iterating over the state texture grinds performance to a halt.

Overcoming these difficulties requires a totally different strategy. In order to lever-age the parallelism of the GPU, we relax the restriction that all allocations mustcomplete. This is not as unreasonable as it first sounds, since even CPU allocatorsmay fail if their internal memory becomes filled or overly fragmented. It will beshown that our parallel allocator behaves similarly, with odds of success directlyproportional to the amount of free memory. The allocator is not affected by mem-ory fragmentation, and takes on average O(1) time. Moreover, the probability ofsuccessful allocation quickly converges to 1 within a few iterations, which is goodenough for a large scale simulation. We begin first with an informal description ofthe algorithm’s essential structure.

Agent replication is initiated by setting a flag within the agent state texture that sig-nals that the agent is gravid, or about to reproduce. For the purposes of illustratingthe reproduction process, Fig. 6 represents a simplified version of the state texture.The basic goal of the allocator is to place each newly created agent into one ofthe empty cells. In other words, it must match each gravid cell to a unique emptycell. Additionally, the dual situation must also occur, in order for an empty cell tobecome alive, it needs to know about some unique gravid parent which will be de-scribe its initial state. One way to think about this problem is that we are searchingfor an invertible map, from the agent state texture onto itself. Additionally, evaluat-ing this map must be easily done in parallel, otherwise the allocator will perform nobetter than the sequential counter part. As previously observed, finding an optimalmapping is rather difficult. To resolve this issue, we propose using an iterative ran-domized scheme. Failed mappings from gravid agents to live agents will be treatedas failed allocations. For the sake of simplicity, assume a 1-dimensional agent state

9

texture. Let A = {a0, a1, ...an−1} be the agent state texture and f : A → A aninvertible function:

f(ai) = ai+r (1)f−1(ai) = ai−r (2)

Where r is a random integer in the range [1, n). The arithmetic in expression ai+r

is evaluated modulo n, so for example an+3 maps to a3. Moreover, it is trivial toevaluate both f and its inverse, f−1, using a fragment shader. Using Eqn. 1, it isnow possible to construct the following parallel agent allocation algorithm.

Algorithm 2 Parallel agent allocation1: for i = 0 to NumPasses do2: r = a random integer between 1 and n-1.3: for all agent ai do4: if ai is gravid and ai+r is empty then5: mark ai not gravid.6: else if ai is empty and ai−r is gravid then7: mark ai born.8: end if9: end for

10: end for

Algorithm 2 is not guaranteed to replicate all agents. However, it may be run mul-tiple times in order to improve the odds of success. Suppose that l is equal to thenumber of non-empty cells. Then, the probability, p(k), that any agent has success-fully replicated after k iterations of Algorithm 2 is given by:

p(k) = 1− (l

n)k (3)

Since ln

< 1, p(k) quickly approaches 1 as k increases. The argument for this issimilar to that used in the birthday paradox. If half of the agent cells are currentlyfilled, the probability of success reaches over 95% after only 5 iterations. The oddsof success can be improved by allocating larger agent state textures, which in turndecrease the likelihood of an invalid collision. To illustrate this behavior, considerFig. 7. For this example r = 2. Two of the gravid cells are successfully connectedto empty cells. The other two inadvertantly collide with already filled cells, and arenot able to replicate. At the end of the iteration, the agent state texture looks likeFig. 8.

Implementing Algorithm 2 on the GPU is straightforward. The random variableis set as a uniform value by the CPU at the start of the rendering pass, and thearithmetic is done using 2D indexing. For efficiency purposes, multiple iterationsof the algorithm can be combined into a single pass, however the number of indirect

10

+2 +2

−2 −2 −2−2

+2+2 +2 +2

Fig. 7. Connecting gravid cells to empty cells.

Fig. 8. The result of one iteration of the allocator shader.

lookups caused by this optimization increases exponentially, so it is only suitablefor small numbers of passes.

6 Visualization

The enormous performance increase given by the GPU implementation enablesunprecedented levels of real-time user interaction. The GPU’s capabilities makecertain tasks such as 3D visualization much easier than on the CPU. The situationis further helped by the fact that our previous simulation methods are containedentirely within the GPU. As a result, all that is needs to be done is to translate theinternal data into user visual feedback using programmable GPU techniques. Thisvisual feedback is especially important for agent based simulations. It is especiallycritical when debugging and prototyping models, since vision provides the mostnatural method for studying the behavior of an evolving system.

For strictly 2-Dimensional simulations, one possible approach is to use a combina-tion of masking and blending operations to combine the collision map and worldtexture creating a directly viewable image using a single pass fragment shader.Certain quantities in the world can be re-colored to enhance the quality of the sim-ulation and make it easier for user to discern the state of the system. In some sim-ulations, the environment can be interpreted as a height map. In this situation, theagents are allowed to move in 3 dimensions, while the environment is effectivelya 2 dimensional field. The display of heightmap data has been well researched inthe context of GPU programming. One particularly efficient scheme is Losasso &Hoppe’s Geometry ClipMaps[34], which has the advantage that it executes entirelywithin the GPU. The ClipMap algorithm can be further simplified in this instance,since all necessary data is already within the GPU and therefore the required com-putation time can be minimized.

For volumetric models there are several options[35]. Perhaps the most direct methodis to render volumetric slices back to front by sampling the 3 dimensional texture.

11

More recently, raycasting has become a popular option for GPUs, since it incurs afar lower overdraw penalty[36]. Both are capable of executing entirely within theGPU. Rendering the mobile agents in a 3D environment has been well researchedin the context of crowd visualization. Extremely efficient rendering is possible us-ing the impostor methods given by Rudomın and Millan[37]. This strategy workswell for both 2D and 3D visualization, since it is implemented directly by the GPUhardware.

7 Results

To test the performance of the GPU compared to other computing methods, weimplemented several popular models. The models were implemented using C++,OpenGL and GLSL. For benchmarks, the system used was an AMD Athlon643500+ with 1 GB RAM and an NVidia GeForce 8800 GTX GPU. For the oper-ating system, we used Ubuntu Linux 7.04. At the time of purchase, the total cost ofthis system was under $1400. For comparison, several popular non-GPU platformswere also benchmarked, such as NetLogo[6], Repast[2] and MASON[4].

7.1 Sugar Scape

Table 1SugarScape (100x100) Performance Comparison

Platform Updates per Second

GPU 2000

Repast 16.05

Table 2SugarScape Grid Size Performance (2 million agents)

Grid size Updates per second

256x256 95.7

512x512 85

1024x1024 74.7

2048x2048 52.6

4096x4096 30.2

Sugar Scape is a social agent based model developed by Epstein and Axtel[32].Despite its simplicity, Sugar Scape is an extremely relevant model since manymore complex social simulations are based on the same underlying principles[38].Agents in SugarScape have a number of attributes such as vision or metabolism,

12

Fig. 9. SugarScape with a 2560x1024 grid and over 1 million agents

and are capable of adapting to varying environments. For the purposes of compar-ison, we implemented rules G, R and M, similar to Repast’s demo program[2]. Ona 100x100 grid, the average updates per second of both models is given in Table 1.The overall speed up of the GPU versus Repast is over 124x. With this performanceincrease, it is possible to handle previously inconceivable grid sizes and scales. Animage of the GPU running a 2560x1024 resolution simulation of over one millionagents is shown in Fig. 7.1. The average frames per second on this simulation isapproximately 56. The exact timings of the GPU’s performance with respect to en-vironment grid size are given in Table 2. With the GPU, it is possible to achieveover 16 million concurrent agents on grid sizes of up to 4096x4096. The effects ofvarying the number of mobile agents are shown in Table 3.

7.2 Stupid Bugs

The Stupid Bug model, originally introduced by Railsback et al, is a benchmarkfor agent based modeling toolkits[31]. It functions as a kind of stress test, contain-ing many costly functions such as 9x9 vision filters, rapid replication and multipleagent types. To test the limitations of our system, we implemented StupidModelversion 16, the most complex version of the simulation. For comparison, we ran our

Table 3SugarScape Agent Performance (2560x1024 Grid Size)

Number of Agents Updates per second

16777216 10.2

8388608 19.4

4194304 32.6

2097152 56.6

1048576 93.4

524288 130.6

262144 189.9

13

Fig. 10. Stupid Model version 16 with a 2560x1024 grid and over 1 million agents

Fig. 11. Performance of the GPU implementation of Railsback et al’s StupidBugs version16 compared to standard modeling toolkits (logarithmic scale).

implementation on a 256x128 grid with about 2000 bugs, which is slightly largerthan that used by Railsback et al in their survey. The relative performance of theGPU compared to other systems is shown in Fig. 7.2. Once more, the GPU dramat-ically out-performs other platforms. Going further, we once again try to push theGPU to the limit of what is possible. As shown in Table 4, the GPU can easily han-dle population sizes well over 1 million agents, and grid sizes of over 2048x2048.An image of stupid bugs running with 1 million agents on a 2560x1024 grid isshown in Fig. 7.2. The performance of the GPU StupidBugs implementation withrespect to agent size is shown in Table 5. Our system also compares favorably with

Table 4StupidBugs v16 Grid Size Performance (2 million agents)

Grid size Updates per second

256x256 45

512x512 40

1024x1024 36

2048x2048 27

14

the findings of Standish, who reported performance of approximately 1 update per200 seconds on StupidBugs v16 with one million agents on a 2048x2048 grid onthe EcoLab simulation toolkit[17].

8 Conclusion

We have successfully implemented an ABM simulation on the GPU. Our simula-tion runs entirely on the GPU and takes full advantage of the large memory band-width and parallel computational power. To the best of our knowledge, there are nosingle computer ABM frameworks that can deliver the performance of our proto-type system. We suspect that our prototype will outperform many cluster solutionsas well.

Currently, online statistics gathering is not implemented. However, we are work-ing on an algorithm that is based on image histogram generation, a topic well re-searched in computer graphics [39,40], which should provide interactive statisticaldisplays. While the performance of GPUs is phenomenal, programming them iscounterintuitive. NVIDIA’s CUDA[41] system provides a substantially better in-terface, and in the future we plan to develop CUDA based libraries for essentialABM functions to ease deployment of ABM simulations on GPUs.

References

[1] N. Minar, R. Burkhard, C. Langton, M. Askenazi, The swarm simulation system: Atoolkit for building multi-agent simulations, Tech. rep. (1996).

[2] N. Collier, Repast: An agent based modelling kit for java (2001).

[3] E. Tatara, M. J. North, T. R. Howe, N. T. Collier, J. R. Vos, An introduction to repastsimphony modeling using a simple predator-prey example, Proceedings of the Agent2006 Conference on Social Agents: Results and Prospects.

Table 5StupidBugs v16 Agent Performance (2560x1024 grid size)

Max agents Updates per second

262144 92.9

524288 69.4

1048576 49.5

2097152 31.2

4194304 17.8

8388608 9.3

15

[4] S. Luke, C. Cioffi-Revilla, L. Panait, K. Sullivan, Mason: A new multi-agentsimulation toolkit, Proceedingso of the 2004 SwarmFest Workshop.

[5] M. Resnick, Starlogo: an environment for decentralized modeling and decentralizedthinking, in: CHI ’96: Conference companion on Human factors in computingsystems, ACM, New York, NY, USA, 1996, pp. 11–12.

[6] U. Wilensky, Netlogo (1999).

[7] S. Tisue, U. Wilensky, Netlogo: Design and implementation of a multi-agent modelingenvironment.

[8] New york city population projections by age/sex and borough (2006).

[9] S. F. Railsback, S. L. Lytinen, S. K. Jackson, Agent-based simulation platforms:Review and development recommendations, SIMULATION.

[10] F. Massaioli, F. Castiglione, M. Bernaschi, Openmp parallelization of agent-basedmodels, Parallel Comput. 31 (10-12) (2005) 1066–1081.

[11] M. Scheutz, P. Schermerhorn, Adaptive algorithms for the dynamic distribution andparallel execution of agent-based models, J. Parallel Distrib. Comput. 66 (8) (2006)1037–1051.

[12] M. Quinn, R. Metoyer, K. Hunter-Zaworski, Parallel implementation of the socialforces, in: Proceedings of the Second International Conference in Pedestrian andEvacuation Dynamics, 2003, pp. 63–74.

[13] T. Da-Hun, F. Tang, T. Lee, A. Krishnan, A. Goryachev, Parallel computing platformfor agent-based modeling of multicellular biological systems, Lecture Notes inComputer Science.

[14] B. Logan, G. Theodoropolous, The distributed simulation of multi-agent systems(2001).URL citeseer.ist.psu.edu/logan00distributed.html

[15] M. Lees, B. Logan, T. Oguara, G. Theodoropoulos, Simulating agent-based systemswith HLA: The case of SIM AGENT – part II, in: Proceedings of the 2003European Simulation Interoperability Workshop, European Office of AerospaceR&D, Simulation Interoperability Standards Organisation and Society for ComputerSimulation International, 2003.

[16] T. Som, R. Sargent, Model structure and load balancing in optimistic parallel discreteevent simulation, in: Proceedings of the fourth workshop on parallel and distributedsimulation, 2000, pp. 147–154.

[17] R. K. Standish, Going stupid with ecolab (2007).

[18] A New Kind of Science, Wolfram Media Inc., Champaign, Ilinois, US, United States,2002.

[19] E. Bonabeau, From classical models of morphogenesis to agent-based models ofpattern formation, Artif. Life 3 (3) (1997) 191–211.

16

[20] M. J. Harris, W. V. Baxter, T. Scheuermann, A. Lastra, Simulation of clouddynamics on graphics hardware, in: HWWS ’03: Proceedings of the ACMSIGGRAPH/EUROGRAPHICS conference on Graphics hardware, EurographicsAssociation, Aire-la-Ville, Switzerland, Switzerland, 2003, pp. 92–101.

[21] M. J. Harris, G. Coombe, T. Scheuermann, A. Lastra, Physically-based visualsimulation on graphics hardware, in: HWWS ’02: Proceedings of the ACMSIGGRAPH/EUROGRAPHICS conference on Graphics hardware, EurographicsAssociation, Aire-la-Ville, Switzerland, Switzerland, 2002, pp. 109–118.

[22] J. Singler, Implementation of cellular automata using a graphics processing unit.

[23] O. Fialka, M. Cadık, Fft and convolution performance in image filtering on gpu, in:IV ’06: Proceedings of the conference on Information Visualization, IEEE ComputerSociety, Washington, DC, USA, 2006, pp. 609–614.

[24] D. Goddeke, Gpgpu tutorials - basic math.

[25] K. S. Perumalla, B. G. Aaby, Data parallel execution challenges and runtimeperformance of agent simulations on gpus.

[26] K. Moreland, E. Angel, The fft on a gpu, in: HWWS ’03: Proceedings of the ACMSIGGRAPH/EUROGRAPHICS conference on Graphics hardware, EurographicsAssociation, Aire-la-Ville, Switzerland, Switzerland, 2003, pp. 112–119.

[27] NVIDIA, CUDA: CUFFT Library (2007).

[28] M. Harris, Gpgpu: Beyond graphics, Tech. rep. (2004).

[29] M. Sussman, W. Crutchfield, M. Papakipos, Pseudorandom number generation on thegpu, Graphics Hardware.

[30] J. D. Owens, D. Luebke, N. Govindaraju, M. Harris, J. Krger, A. E. Lefohn, T. J.Purcell, A survey of general-purpose computation on graphics hardware, ComputerGraphics Forum 26 (1) (2007) 80–113.

[31] S. Railsback, S. Lytinen, V. Grimm, Stupidmodel and extensions: A template andteaching tool for agent-based modeling platforms.

[32] J. M. Epstein, R. L. Axtell, Growing Artificial Societies: Social Science from theBottom Up, MIT Press, 1996.

[33] A. Kolb, L. Latta, C. Rezk-Salama, Hardware-based simulation and collisiondetection for large particle systems, in: HWWS ’04: Proceedings of the ACMSIGGRAPH/EUROGRAPHICS conference on Graphics hardware, ACM Press, NewYork, NY, USA, 2004, pp. 123–131.

[34] F. Losasso, H. Hoppe, Geometry clipmaps: terrain rendering using nested regulargrids, in: SIGGRAPH ’04: ACM SIGGRAPH 2004 Papers, ACM Press, New York,NY, USA, 2004, pp. 769–776.

[35] K. Engel, M. Hadwiger, J. M. Kniss, A. E. Lefohn, C. R. Salama, D. Weiskopf, Real-time volume graphics, in: SIGGRAPH ’04: ACM SIGGRAPH 2004 Course Notes,ACM Press, New York, NY, USA, 2004, p. 29.

17

[36] J. Kruger, R. Westermann, Acceleration techniques for gpu-based volume rendering,in: VIS ’03: Proceedings of the 14th IEEE Visualization 2003 (VIS’03), IEEEComputer Society, Washington, DC, USA, 2003, p. 38.

[37] E. Millan, I. Rudom’in, Impostors and pseudo-instancing for gpu crowd rendering,in: GRAPHITE ’06: Proceedings of the 4th international conference on Computergraphics and interactive techniques in Australasia and Southeast Asia, ACM Press,New York, NY, USA, 2006, pp. 49–55.

[38] M. Drewek, W. Bulleit, Simulating terrorism in a community.

[39] O. Fluck, S. Aharon, D. Cremers, M. Rousson, Gpu histogram computation, in:SIGGRAPH ’06: ACM SIGGRAPH 2006 Research posters, ACM Press, New York,NY, USA, 2006, p. 53.

[40] T. Sheuermann, J. Hensley, Efficient histogram generation using scattering on gpus.

[41] M. Harris, Cuda: performance tips and tricks, in: SIGGRAPH ’07: ACM SIGGRAPH2007 courses, ACM, New York, NY, USA, 2007, p. 9.

18

Documents

A Framework for Megascale Agent Based Model Simulations on ...pages.mtu.edu/~rmdsouza/Papers/2007/Agent_framework.pdf · A Framework for Megascale Agent Based Model Simulations on