A New Methodology for Multi-Scale Simulation of Plasmas · A New Methodology for Multi-Scale Simulation of Plasmas 95 which the computation can predict future computations without

Advanced Methods for Space Simulations, edited by H. Usui and Y. Omura, pp. 91–99.c© TERRAPUB, Tokyo, 2007.

A New Methodology for Multi-Scale Simulation of Plasmas

H. Karimabadi1, Y. Omelchenko1, J. Driscoll1, R. Fujimoto2, and K. Perumalla2

1SciberQuest, Inc., Solana Beach, CA 92075, U.S.A.2Georgia Institute of Technology, Atlanta, GA 30332, U.S.A.

The traditional technique for simulating physical systems modeled by partial dif-ferential equations is by means of time-stepping methodology where the state of thesystem is updated at regular discrete time intervals. This method has inherent inef-ficiencies. In contrast, we have developed a new asynchronous type of simulationbased on a discrete-event-driven (as opposed to time-driven) approach, where thestate of the simulation is updated on a “need-to-be-done-only” basis. Here we reporton this new technique, show several examples, and briefly discuss additional issuesthat we are addressing concerning algorithm development and their parallel execu-tion.

1 IntroductionThe strongly disparate temporal and spatial scales commonly occurring in many

complex physical systems pose a significant computational challenge and necessitatea leap in simulation technology. Although the Adaptive Mesh Refinement (AMR)methodology is a powerful technique for addressing large variations in spatial scales,the time update of variables is still done within the traditional time-stepping method-ology. This has inherent inefficiencies and suffers from the usual Courant-Friedrichs-Levy (CFL) limitations. We have developed a novel simulation technique by replac-ing time stepping with an event-driven method for updating the simulation variables.Event-driven simulations have their origins in operations research and managementscience, and more recently have found application in war games and telecommu-nications but have not been applied to plasma simulations. We have combined thetraditional—mesh discretization techniques with a novel discrete-event methodologyand developed several prototype codes. In our discrete-event simulation (DES) ap-proach, the traditional measure of time advance (time-step size) is replaced by thephysically meaningful information unit, f . This technique in effect introducesan individual adaptive time line for every computational entity, enabling truly asyn-chronous time integration of the system state variables. As a result, at any given timethe DES model has to process only changes to its global state that exceed the mini-mum information unit, f . This eliminates unnecessary computation in the inactiveregions. Figure 1 illustrates algorithmic flows in the typical explicit time-steppingand discrete-event PDE-based simulations.

Several parallel applications of this technology have been built (Section 3), rang-ing from an electrostatic particle code to an electromagnetic hybrid (electron fluid,

91

92 H. Karimabadi et al.

Fig. 1. Comparison of control flows in the traditional time-stepping and event-driven simulations(Omelchenko and Karimabadi, 2006a).

Modeling Application

state variablescode modeling system behaviorI/O and user interface software

Parallel Simulation Engine

event list managementmanaging advances in simulation time

calls toscheduleevents

calls to event handlers

Fig. 2. Components of discrete event simulation.

particle ions) code to a diffusion equation solver, and have demonstrated superiormetrics:Faster: By eliminating unnecessary computations, speed-ups as large as a factor of300 were achieved in one-dimension, with further enhancements being expected in2D and 3D.More Accurate: By updating the system based on local f rather than timestep, theuser has more effective control over the desired numerical accuracy.Stable: The DES codes run successfully in regimes where standard codes are subjectto explosive numerical instabilities.

2 DES Algorithms: Issues and SolutionsAs shown in Fig. 2, a discrete-event simulation (DES) system can be broken into

two components: (1) the models and (2) the parallel simulation executive that man-ages events and the progression of simulation time. Development of next generationplasma codes requires innovations in both components.

A New Methodology for Multi-Scale Simulation of Plasmas 93

Fig. 3. The MPDES class collaboration diagram. The MPDES object encapsulates the global simulationgeometry properties and defines the table of virtual DES processes.

Field equations are discretized in space in the conservation form. Each compu-tational mesh cell is assigned discrete states associated with the temporal evolutionof local field and particle quantities. Transitions from one temporal state to anotherare called “events”. Time integration of each field component is delayed by a timeinterval depending on the magnitude of its predicted rate of change. Particles arescheduled for advance in each cell based on their current velocities, local field mag-nitude and cell size. Each computation cell keeps a registry of increments to itsoriginal state (the state used for the prediction) caused by the neighboring cells andreschedules events (time advances) to earlier times if the cell state is significantlyaltered during the predicted time delay. The DES code programming architecture isdrastically different from conventional (time-driven) codes. In particular, each meshcell has a means of polling its neighbors and fetching global simulation informationusing its local data handlers. It is also aware of its role in establishing communicationwith remote (distributed) parts of the system or applying proper boundary conditions.A nontrivial problem is to preserve fluxes across mesh cell interfaces.

In explicit time-driven codes adjacent cells are always advanced with fluxes takenat the same time level. DES cells schedule themselves asynchronously and thereforespecial care must be taken to ensure that field quantities in cells with common inter-faces are always integrated in time with identical fluxes across the common bound-aries.

We have developed a library of C++ classes (SciDES) designed to provide a setof discrete-event software tools for implementing finite-difference and particle-in-cell methods for the solution of coupled partial differential equations and equationsof particle motion. SciDES standardizes fundamental data structures and algorithmsfor programming distributed time-dependent scientific models on block-structuredcomputational domains and formalizes the most essential aspects of the distributedphysics-based DES models in the form of a pseudo-distributed architecture. This pur-sues several goals. First, the SciDES API separates the computational physics algo-rithms from the communication issues by abstracting them into well defined concepts(C++ classes) and providing all the necessary “go-between” implementation details.


Second, it fosters more efficient cooperation of computational physicists with com-puter scientists working on the distributed discrete-event engine algorithms since itallows substitution of pseudo-distributed plug-in modules by their MPI counterpartsin a plug-and-play fashion without breaking the physics core of the code. In addition,the ability to run virtual distributed simulations on a single CPU enables testing vari-ous physical mechanisms that provide important insight into predictive properties ofphysics-based parallel discrete-event simulations.

An example of our SciDES architecture, the class MPDES, which abstracts thevirtual multi-processor DES environment, is shown in Fig. 3. In this diagram solidarrows are indicative of inheritance (the “is a” relationship), dark dashed arrows rep-resent ownership (the “has a” relationship) and light dashed arrows mark class instan-tiation from template classes.2.1 Parallelization

The parallelization of asynchronous (event-driven) continuous PIC modelspresents a number of challenges. As in conventional (time-driven) simulations, itis realized by decomposing the global computation domain into subdomains. In eachsubdomain, the individual cells and particles are aggregated into containers, whichare mapped to distributed parallel processors in a way that achieves maximum loadbalancing efficiency. The parallel execution of conventional (time-driven) simula-tions is commonly achieved by copying field information from the inner lattice cellsto the ghost cells of the neighboring subdomains and exchanging out-of-bounds par-ticles between the processors at the end of each update cycle. In contrast, in parallelasynchronous PIC simulations both particle and field events are not synchronized bythe global clock (i.e. they do not take place at the same time levels throughout thesimulation domain), but occur at arbitrary time intervals, which may introduce syn-chronization problems if some processors are allowed to get ahead in time of otherprocessors (the “optimistic” approach). As a result, a processor may receive an eventmessage from a neighbor with a simulation time stamp that is in its own past, thuscausing a causality error. On the other hand, parts of a distributed discrete-event simu-lation can be forced to execute synchronously with remote tasks corresponding to theneighboring subdomains (the “conservative” approach). If so, the parallel speed-upcritically depends on the underlying domain decomposition technique and additionalpredictive (“look-ahead”) properties of the simulation in question. Regardless of theapproach taken, it is important to note that DES computations offer substantial speed-ups compared to conventional explicit time-driven simulations due to reduced amountof computation that must be performed.

The following are some of the important issues that must be addressed in paralleldiscrete event simulations of continuous systems:

Synchronization: This is by far the paramount issue to be carefully resolved forachieving the best parallel execution performance. Broadly there are two approachescommonly used—conservative and optimistic.

Conservative: This approach always ensures safe timestamp-ordered processing.However, runtime performance is critically dependent on a priori determination of anapplication property called lookahead, which is roughly dependent on the degree to


which the computation can predict future computations without global information.In one conservative approach, events that are beyond the next lookahead window areblocked until the window advances sufficiently far to cover those events. Typicallythe lookahead property is very hard to extract in complex applications, as it tends to beimplicitly defined in the source code interdependencies. The appeal of this approachhowever is that it is one of the easiest schemes to implement if the lookahead issomehow specified by the application.

Optimistic: This approach avoids blocked waiting by optimistically processingthe events beyond the lookahead window. When some events are later detected tohave been processed in incorrect order, the system invokes compensation code suchas state restoration or reverse computation (Tang et al., 2006). Since blocking isnot used, the lookahead value is not as important, and could even be specified to bezero without affecting the runtime performance. While this approach eliminates theproblem of lookahead extraction, it has a different challenge—namely, support forcompensating code.

Combination: Sometimes it might help to have some parts of the application ex-ecute optimistically ahead (e.g., parts for which lookahead is low are hard to extract),while other parts execute conservatively (e.g., parts for which lookahead is large, orfor which compensation code is difficult to generate). In such cases, a combinationof conservative and optimistic synchronization techniques can be appropriate.

Load Balancing: As with any parallel/distributed application, the best perfor-mance is obtained when the load is evenly balanced across all resources. In par-allel simulation in particular, load imbalance can have a very adverse effect. Thisis because typically the slowest processor can hold back the progress of simulation(virtual) time, which in turn slows down even those processors which are relativelylightly loaded.

Automated/Adaptive: Automated schemes are preferable for load-balancing atruntime. These schemes vary with the particular synchronization approach used.

Support Primitives: In order to permit automated/adaptive load balancing bythe system, it is important to provide appropriate primitives to the application, so thatapplication-level entities can be moved across processors easily by the system in atransparent manner as needed.

Modeling and Runtime Interface: To be able to decouple the implementa-tion details of the parallel simulation executive from the application/models, it isbest to define the model-simulation interface in an implementation-independent fash-ion. This not only helps avoid reimplementation of models whenever the engine ischanged, but also permits experimentation with multiple synchronization and load-balancing approaches for the same application. Additionally, it enables engine-leveloptimizations to remain transparent to the application, so that the application-devel-oper is not burdened or sidetracked with such issues during model development.

With the preceding issues in mind, we are carefully developing appropriate inter-faces and implementations of our parallel execution engine. A brief description ofour approach follows:


1.0

-1.0

By

Bz

N

B

Vx

1.0

-1.0

2.0

0.6

0.5

-0.2

0.9

1.2

100 350X

IS

Fast Shock

Ωt=200Red: Discrete eventGreen: Predictor-correctorBlack: Implicit

IS

150 350X

Enlarged and Shifted1.0

-1.0

By

Bz

N

B

Vx

1.0

-1.02.0

0.6

0.5

-0.2

0.9

1.2

Fig. 4. Comparison of intermediate shock (IS) profiles obtained with the time-stepping and DES codes.

• The synchronization issue is being resolved by providing a transparent inter-face that does not mandate one synchronization approach over another. Theunderlying implementation is also being developed such that different modelentities can chose different synchronization (conservative or optimistic execu-tion style), as is most appropriate for them.

• The load balancing issue is being addressed by the use of an “indirect mes-saging” interface layer that decouples application entities from their processormapping.

• The modeling and runtime interface is also kept abstract and flexible, so thatradically alternative implementations can be implemented underneath the in-terface.

We are also developing a general purpose tool to help predict the performance ofparallel/distributed discrete event and time stepped simulations on massively paral-lel platforms (Perumalla et al., 2005). It is intended to be useful in experimenting


Fig. 5. Comparison of solution profiles in the high-Mach-number shock turbulence, obtained with thetime-stepping and DES codes.

with and understanding the effects of execution parameters, such as different loadbalancing schemes and mixtures of model fidelity.

We will demonstrate the power of this new methodology through three examples:(i) sheath problem, (ii) diffusion equation with non-uniform diffusion coefficient, and(iii) fast magnetosonic shock and the associated particle acceleration.

3 Model Verification and ResultsWe have successfully applied the discrete event simulations to a host of prob-

lems including (i) electrostatic simulations (Karimabadi et al., 2005a; Tang et al.,2006), (ii) explicit time integration of multi-scale, flux-conservative partial differ-


Fig. 6. Phase space diagram obtained with the DES and predictor-corrector codes for thehigh-Mach-number shock shown in Fig. 5.

ential equations with source terms (Omelchenko and Karimabadi, 2005), (iii) elec-tromagnetic hybrid simulations (Karimabadi et al., 2005b, 2006; Omelchenko andKarimabadi, 2006b; Perumalla et al., 2006) and gas dynamics (Omelchenko andKarimabadi, 2006c). Here we show an example of DES-based hybrid simulation.Details of the technique can be found in Omelchenko and Karimabadi (2006b). First,we benchmarked our DES code against time-stepped hybrid code for rotational dis-continuities, intermediate shocks, slow shocks, and low Mach number fast magne-tosonic shocks. Figure 4 shows the comparison of intermediate shock (IS) profilesobtained with the time-stepping (predictor-corrector and resistive) and discrete eventcodes (Omelchenko and Karimabadi, 2006b). There is excellent agreement betweenthe time-stepped and DES codes. Small variations in the shock phases and amplitudesin the three codes are mostly caused by subtle differences in their particle initializa-tion and injection schemes. Having done this benchmarking, we then moved to amore computationally challenging problem, namely simulation of high Mach num-ber magnetosonic shock. Figure 5 compares the DES solution profile with those ob-tained in the corresponding time-stepping simulations. The event-driven simulationshows four well-resolved zones of shock-driven turbulence: (i) the low-frequencysteepened oscillations observed far upstream (“shocklets”), (ii) the short-wavelengthoscillations in the near upstream region (driven by the reflecting ions), (iii) the co-herent shock transition region (where the upstream oscillations get compressed andamplified), and (iv) the long-wavelength waves in the downstream region. On theother hand, the time-stepped solution is found to drastically differ from the DESsolution as it achieves poorer resolution in all regions of the computation domain.


The differences between DES and time-stepped code are further illustrated in phasespace diagrams in Fig. 6. The abnormally rapid decay of coherent oscillations in thetime-stepping simulations was noted previously (Quest, 1988). Comparisons withthe DES model indicate that this is caused by numerical errors, which lead to phasede-synchronization between particle and field dynamics.

The event-driven algorithm described in this paper is fully extendable to multipledimensions and nonuniform meshes. Currently we are developing a uni-dimensionalinfrastructure with adaptive logical mapping capabilities. We are also in the processof incorporating DES technology for temporal update on Structured Adaptive MeshRefinement (SAMR) meshes, which will lead to even further speed-ups as well asmuch better accuracy control than SAMR alone.

Acknowledgments. This research was supported by the National Science Foundation Informa-tion Technology Research (ITR) grant numbers 0529919 at SciberQuest, Inc. and 0326431atGeorgia Institute of Technology. Some of the computations were performed at the San DiegoSupercomputer Center.

ReferencesKarimabadi, H., J. Driscoll, Y. A. Omelchenko, and N. Omidi, A new asynchronous methodology for

modeling of physical systems: breaking the curse of courant condition, J. Comp. Phys., 205(2), 755–775, May 20, 2005a.

Karimabadi, H., Y. Omelchenko, J. Driscoll, R. Fujimoto, K. Perumalla, and D. Krauss-Varban, A newsimulation technique for study of collisionless shocks: self-adaptive simulations, 4th IGPP AstrophysicsConference Proceedings, 2005b (in press).

Karimabadi, H., J. Driscoll, J. Dave, Y. Omelchenko, R. Fujimoto, K. Perumalla, and N. Omidi, ParallelDiscrete Event Simulations of Grid-based Models: Asynchronous Electromagnetic Hybrid Code, inApplied Parallel Computing: State of the Art in Scientific Computing, Springer-Verlag Lecture Notes inComputer Science Proceedings, 3732, 573, 2006.

Omelchenko, Y. A. and H. Karimabadi, Self-adaptive time integration of flux-conservative equations withsources, J. Comp. Phys., 216, 179, 2006a.

Omelchenko, Y. A. and H. Karimabadi, Event-driven hybrid particle-in-cell simulation: A new paradigmfor multi-scale plasma modeling, J. Comp. Phys., 216, 153, 2006b.

Omelchenko, Y. A. and H. Karimabadi, A time-accurate explicit multi-scale technique for gas dynamics,J. Comp. Phys., 2006c (in press).

Perumalla, K., R. Fujimoto, Thakare, Pande, H. Karimabadi, Y. Omelchenko, and J. Driscoll, Performanceprediction of large-scale parallel discrete event models of physical systems, Proceedings of the 2005Winter Simulation Conference, edited by M. E. Kuhl, N. M. Steiger, F. B. Armstrong, and J. A. Joines,2005 (in press).

Perumalla, K., R. Fujimoto, and H. Karimabadi, Scalable simulation of electromagnetic hybrid codes,Computational Science—ICCS, Lecture Notes in Comnputer Science, 3992, 41, 2006.

Quest, K., Theory and simulations of collisionless parallel shocks, J. Geophys. Res., 93(A9), 9649, 1988.

Tang, Y., K. Perumalla, R. Fujimoto, H. Karimabadi, J. Driscoll, and Y. Omelchenko, Optimistic simula-tions of physical systems using reverse computation, Simulation-Transactions of the Society for Model-ing and Simulation International, 82, 61, 2006.

Documents

A New Methodology for Multi-Scale Simulation of Plasmas · A New Methodology for Multi-Scale Simulation of Plasmas 95 which the computation can predict future computations without