[IEEE 2009 IEEE International Conference on Cloud Computing - Bangalore, India...

Preview:

Citation preview

Optimistic Synchronization of Parallel Simulations in Cloud Computing Environments

Asad Waqar Malik1, Alfred Park2, Richard M. Fujimoto3

1Asad_maalik@yahoo.com, 2ajpark@us.ibm.com, 3fujimoto@cc.gatech.edu 1National University of Science and Technology, Pakistan

2IBM T.J. Watson Research Center, Yorktown Heights, USA 3Computational Science and Engineering Division

Georgia Institute of Technology, USA

Abstract

Cloud computing offers the potential to make parallel discrete event simulation capabilities more widely accessible to users who are not experts in this technology and do not have ready access to high performance computing equipment. Services hosted within the “cloud” can potentially incur processing delays due to load sharing among other active services, and can cause optimistic simulation protocols to perform poorly. This paper proposes a mechanism termed the Time Warp Straggler Message Identification Protocol (TW-SMIP) to address optimistic synchronization and performance issues associated with executing parallel discrete event simulation in cloud computing environments.

1. Introduction Parallel discrete event simulation (PDES) refers to

the execution of a discrete event simulation program across multiple processors. Typically this is done to scale simulations to larger configurations, to increase the detail and fidelity of the model, and/or to reduce execution time [1]. PDES has been applied to a variety of applications such as modeling large-scale telecommunication networks [2], manufacturing [3], and transportation systems [4], to mention a few.

PDES is increasing in importance. As computational hardware becomes increasingly parallel (e.g., multi-core CPUs) with less focus on clock speed improvements, simulation developers are turning to exploit parallelism as a means to address performance concerns. However, a significant impediment to achieving widespread exploitation of PDES technology concerns the skill and expertise required to develop and execute PDES codes. Cloud computing offers a potential solution to addressing this concern by hiding many of the details of simulation execution from end users. Most work to date has focused on executing PDES codes on tightly coupled high performance computing platforms on a dedicated (for the duration of the simulation run) set

of computing nodes. Cloud computing offers the ability to relax this requirement.

A PDES program consists of a collection of logical processes (LPs) that communicate by exchanging time stamped messages or events. Here, we use the terms events and messages synonymously, unless stated otherwise. Each LP is a sequential discrete event simulation that operates by processing events in time stamp order to model the evolution of the system over time. The computation associated with each event involves modifying state variables to reflect changes in the system occurring at that instant in time, and (possibly) scheduling new events for this or other LPs with timestamp in the simulated future.

A fundamental problem in PDES concerns the synchronization of the parallel simulation program. Each LP must process incoming messages (events) in time stamp order, the so-called local causality constraint. This is necessary to ensure that events in the simulated future do not affect events in the past. However, if an LP has received an event with, say, time stamp 10, how can it be sure no event will later arrive from another LP with a time stamp smaller than 10? This issue is referred to as the synchronization problem.

Time Warp [5] is a well-known approach to addressing the synchronization problem that uses rollbacks. Each LP is allowed to process whatever events it has received. If it receives a new event with time stamp smaller than other events it has already processed, it must undo or roll back the computations for these events and re-execute them in the proper (time stamp) sequence. If this computation sent one or more messages to other LPs, the rollback must “unsend” these messages. A mechanism called “anti-messages” is used to cancel these events. Rollback-based mechanism, more generally referred to as optimistic synchronization, are described in greater detail in [1].

Cloud computing is a paradigm where software is provided as a service and computing resources are virtualized. Clouds are often implemented on servers

2009 IEEE International Conference on Cloud Computing

978-0-7695-3840-2/09 $25.00 © 2009 IEEE

DOI 10.1109/CLOUD.2009.79

41

2009 IEEE International Conference on Cloud Computing

978-0-7695-3840-2/09 $25.00 © 2009 IEEE

DOI 10.1109/CLOUD.2009.79

41

2009 IEEE International Conference on Cloud Computing

978-0-7695-3840-2/09 $26.00 © 2009 IEEE

DOI 10.1109/CLOUD.2009.79

41

2009 IEEE International Conference on Cloud Computing

978-0-7695-3840-2/09 $26.00 © 2009 IEEE

DOI 10.1109/CLOUD.2009.79

49

2009 IEEE International Conference on Cloud Computing

978-0-7695-3840-2/09 $26.00 © 2009 IEEE

DOI 10.1109/CLOUD.2009.79

49

that can be accessed by clients operating at remote locations [6]. By satisfying the computing needs of users without requiring them to possess the skill to execute complex application codes or manage sophisticated computing facilities, cloud computing attacks a long-standing problem faced by the PDES community – to simplify exploitation of the technology by domain scientists and engineers who are not experts in PDES techniques.

In the cloud computing paradigm nodes, possibly located at different locations, perform computations on behalf of a client and are linked together with high-speed network connections. Data and application code reside and execute within the cloud. Compute nodes within the cloud will, in general, share resources with other computations.

Cloud computing has several features in common with other computing paradigms such as grid, utility and autonomous computing [7, 8], and elements of the work described here, e.g., the TW-SMIP protocol, may also be applicable to those environments. Technologies that facilitate cloud computing are still evolving. Approaches to cloud computing include Globus virtual workspace service [9], open Nebula, [10, 11] and Amazon Elastic computing [12]. Cloud computing provides a platform neutral environment for the development of applications. This approach to distributed computing is of particular interest in PDES where careful consideration of the hardware platform is currently required.

We propose an approach to achieve efficient execution of optimistic PDES systems in cloud computing architectures. Execution of traditional optimistic PDES systems on such architectures can lead to an excessive number of rollbacks. We define the TW-SMIP protocol that dynamically adjusts the execution of each LP based on local parameters and straggler messages. The protocol avoids barrier synchronizations, and instead dynamically limits forward execution of LPs to reduce the amount of erroneous computation and generation of incorrect messages.

In the remainder of this paper we first describe characteristics of the overall system that is envisioned, and the challenges these present for Time Warp programs. Related work is summarized, followed by a description of the software architecture that is proposed. The TW-SMIP protocol is then described, followed by performance results and conclusions.

2. Issues and Challenges Time Warp consists of two distinct components: a

local control and a global control mechanism. Local control (i.e., state management, rollback recovery and

anti-messages) is implemented within each processor, independent of the other processors. The global control mechanism is used to commit operations such as I/O that cannot be rolled back and to reclaim memory resources through computing a Global Virtual Time (GVT) value. GVT is the minimum simulation time among all unprocessed or partially processed messages and anti-messages in the system.

Challenges in executing Time Warp under a cloud computing architecture include:

1. Effective utilization of resources 2. Load distribution 3. Network traffic and communication 4. Fault tolerance 5. Process synchronization

These capabilities must be provided automatically and transparently to application programs within the cloud. They present certain challenges concerning the optimistic execution paradigm used in Time Warp. These challenges motivate the synchronization algorithm described later. We touch upon each of these issues below.

Traditional approaches to PDES and most work using Time Warp to date assume a fixed set of dedicated computing resources, and typically do not address fault tolerance concerns. These assumptions are too restrictive for cloud environments. Rather, we assume resources are shared among multiple applications executing in the cloud. New resources may become available during the execution of a long-running Time Warp program as existing jobs complete or existing resources may become more heavily utilized as new jobs are initiated on behalf of this or other clients.

Unlike traditional parallel computing applications where a poorly balanced system results in idle processors while other processors are overburdened with computation, a poorly balanced Time Warp program may not result in idle processors. This is because processors without sufficient workload may be optimistically performing computations that are later rolled back. Processor workload must consider rolled back computation in assessing the amount of workload placed on the processor [13].

Network traffic and communication delays must be considered if the cloud computing infrastructure includes geographically distributed resources. Delayed messages may increase the number of straggler messages and rollbacks [14].

The Time Warp program must be able to tolerate failures in the underlying computing infrastructure. It should be able to run to completion despite processor and storage failures or network outages. Finally, as mentioned above, synchronization is a fundamental problem that must be addressed in order to achieve

4242425050

efficient execution of Time Warp programs in cloud computing environments.

3. Related Work While there is little work to date concerning

synchronization of Time Warp programs in cloud computing environments, synchronization in conventional parallel and distributed computing platforms is a mature area of research. Several synchronization techniques were developed to employ optimism control by limiting forward execution of LPs to improve performance. Examples include Moving Time Window protocol (MTW), Adaptive Time Warp (ATW), Breathing Time Warp (BTW) [15, 16] and Local Time Warp (LTW), among others. The Wolf Calls [17] mechanism is perhaps most closely related to the approach described here. Wolf Calls are special messages used to minimize the cascading rollback effect. However, these mechanisms do not address concerns particular to cloud environments, especially the need to execute over non-dedicated computing resources.

A master/worker metacomputing approach to PDES across loosely coupled resources has been explored through the Aurora system as a volunteer computing paradigm [18, 19]. Other research examines PDES in grid computing infrastructures, e.g., see [20]. These efforts address issues associated with geographical distribution, e.g., large communications delay, but do not address the interplay between optimistic synchronization and resource allocation.

4. A Cloud Architecture for Time Warp Cloud computing integrates different service

technologies. The cloud infrastructure, such as the virtualization of hardware and storage systems is delivered as a service. Similarly, application frameworks and software stacks can be realized as platform services in the cloud. A cloud computing architecture unifies and simplifies access to resources without burdening end-users with the costs and complexities associated with acquiring and managing underlying hardware and software layers. However, implementing existing optimistic PDES engines as platform and software services on cloud computing systems can result in poor performance due to the issues discussed in section 2. Efficient execution of optimistic PDES application codes on cloud computing will require a new software infrastructure and algorithms that are aware of the underlying cloud infrastructure.

It is instructive to consider implementing a cloud-based PDES system over an existing infrastructure

such as Hadoop [21]. Hadoop is a framework for running applications on large clusters built using commodity hardware. Hadoop uses a master/worker architecture to implement the Map/Reduce paradigm [22]. A single master server called the jobtracker interfaces the system to the user. Each worker or slave is termed a tasktracker. One tasktracker is instantiated on each node of the cluster. When a job is submitted to Hadoop, it is received by the jobtracker that splits that job into multiple tasks. The jobtracker then distributes these tasks to tasktrackers who execute the tasks in parallel. The Hadoop file system uses a master/worker approach, where the master is called NameNode and slaves are called DataNodes. More than one DataNode can be mapped to a single processor.

Hadoop’s Map/Reduce paradigm can be used to execute PDES codes. Each LP and the set of events in its message queues correspond to a Hadoop task. The initial job is submitted to the jobtracker, and is split into multiple tasks (LPs) that execute independently across computation nodes. Events are processed in the Map phase, and Reduce is used to distribute new events resulting from these computations among the tasks. Distributed File System blocks in Hadoop provide an implementation of fault tolerant state and event queues for each LP.

Although Map/Reduce provides a straightforward means to execute PDES codes, this approach has certain difficulties with respect to performance. An excessive amount of file I/O and data migration will severely limit performance. Further, a communications mechanism tailored to the LP-to-LP communications that occurs in Time Warp would yield a more efficient realization.

These observations suggest that a fruitful approach might be to utilize the overall structure defined by Hadoop, but develop an entirely different code base, one that is optimized for executing Time Warp codes. This is the approach that was adopted here.

The Time Warp cloud computing architecture that was developed is shown in Figure 1. Each LP includes relevant state and event information. The input to each LP is a set of events. LPs may execute in parallel and generate new input events for other LPs. By exploiting the Hadoop architecture, the Time Warp cloud computing architecture provides many of the benefits of cloud computing, namely services such as load balancing, dynamic resource allocation and fault tolerance. This affords the Time Warp cloud computing architecture the potential to realize these benefits in a straightforward manner. For example, in our implementation, multiple LPs are mapped to a single processor to improve performance. This approach is similar to that used in Hadoop for mapping DataNodes.

4343435151

Figure 1. Time Warp framework for cloud

computing environment

The data structures used here are not unlike those defined by Jefferson [5]. He defined an architecture for Time Warp that utilizes an event queue to hold events, an output queue to hold anti-messages that may be later needed upon rollback, and a state queue to hold snapshots of LP state to implement rollback, if needed. The event queue holds both processed and unprocessed events. Storage for processed events with time stamp less than GVT can be reclaimed, or fossil collected. A hash table is used to directly access processed events within the event queue and to easily locate message/anti-message pairs, reducing the time to find an event to be cancelled. The hash function generates indices using the received message timestamp and source LP identifier. In this implementation a single hash table is used for all LPs mapped to a single processor to reduce complexity.

5. The TW-SMIP Protocol A synchronization algorithm is required to manage

optimistic execution within the cloud. This algorithm must allow for highly asynchronous execution, much more so than traditional algorithms intended for execution on dedicated platforms. For example, one must allow for unpredictable suspension of subsets of LPs. This can lead to imbalances where other LPs execute far ahead of the suspended ones in simulation time, leading to excessively long rollbacks and instabilities. Mechanisms requiring tight synchronization such as global barriers are clearly undesirable. Rather, loosely coupled asynchronous mechanisms are needed. Message communications

may incur a large delay, increasing the likelihood of stragglers and overly optimistic execution. Mechanisms to address these and other causes of overly optimistic execution must be provided.

TW-SMIP is an optimistic synchronization protocol intended to address these concerns. Periodic status messages, termed heartbeat (HB) messages, are distributed to LPs residing on a processor to provide information concerning LPs residing on other processors that may send messages. These HB messages are superimposed over a standard Time Warp mechanism. HB messages include information concerning sent messages for straggler detection. They should be given higher priority than other messages, and should not be subject to message bundling in order to minimize their latency.

Figure 2. HB message communication among three LPs (P, Q, R). HB messages are not synchronized

After a fixed interval of time, each processor enters into a straggler message identification phase and sends HB messages to all other processors to which it may communicate as shown in Figure 2 according to the protocol variant described later. HB messages constrain execution of LPs that may be too far in the future. LPs can generate HB messages independent of other LPs in a simulation.

HB messages consist of two array fields: timestamp (TS) and the message identification number (MID). Arrays are used to hold information concerning multiple messages. For example, suppose a source LP (LPi) fills these two fields with the timestamp of messages and message identification numbers when generating a message destined for LPj. During the simulation, each LP saves the TS and MID values of each message it has sent or received to or from other LPs. Upon receiving a HB message, each LP compares the received information with locally saved information. Thus, each LP has two lists: one maintained by the LP that logs messages as they arrive and a second list that is created upon receipt of a HB message. If the lists are not identical then straggler messages exist in the system. This immediately interrupts event processing and the LP rolls back the simulation to that point where a straggler message is expected. If the timestamp of the straggler message is greater than the local time of the

P

Q

R

Input Event

Simulation End (T)

Processed event list

LP State saving

Processed event list

LP State saving

Processed event list

LP State saving

Processed event list

LP State saving

4444445252

destination node then the LP will keep processing events until the time of the straggler message is reached and pauses. There is a possibility that an HB message may be delayed. Under these circumstances, the receiving LP simply ignores the HB message if all the messages identified in the HB message have been received. Additionally, entries in the receive list are removed.

Figure 3. Straggler identification between two

processes (LPi and LPj)

An example is shown in Figure 3 LPi sends three messages to LPj with timestamps of t1 t2 and t3. LPi keeps sent event information in a list; similarly LPj maintains received message information. Messages with timestamp t1 and t3 have been delivered, and t2 is in transit. After a period of wall clock time, LPi sends a HB message to LPj. Upon receipt of the HB message, LPj compares the information encoded in the HB message with its local received list. In the case of a list mismatch, straggler messages are present in the system. LPj stops processing t3 and rolls back the simulation computation to timestamp of t2 and waits to receive the straggler message.

The TW-SMIP protocol for parallel optimistic simulation in cloud computing environments is based on straggler message identification to avoid frequent rollbacks due to asymmetric and uneven processing loads that can be expected to arise. TW-SMIP performs boundary-based synchronization of LPs running on distributed nodes in the cloud architecture. Here we assume that the network guarantees reliable delivery of messages and multiple LPs may be mapped to a single processor. Messages can be delivery out of order, but we assume reliable delivery.

The protocol distinguishes among three classes to distribute HB messages:

1. Complete network-based SMIP (CNB-SMIP)

2. Partial network-based SMIP (PNB-SMIP) 3. Group-based SMIP (GB-SMIP)

These three SMIP protocol variations are distinguished in that they offer different tradeoffs concerning congestion, autonomous administration,

and communication overhead. CNB-SMIP uses a simple broadcast approach; each processor sends HB messages to all other processor’s participating in the simulation after a fixed wall clock time period. This approach is best suited for architectures that provide high performance broadcast communications services. The partial network based (PNB-SMIP) approach is designed to reduce communication overhead, at the cost of some additional complexity. Generated HB messages are only sent to processors where communication has occurred since the last computed GVT value. This approach can significantly reduce the number of HB messages generated during the simulation. The group based (GB-SMIP) approach is designed for use where distinct administrative domains exist. This scheme is used to arrange LPs as a tree structure to distribute HB message, with each group has a root node. A root node responsible for controlling HB messages sent to leaf nodes.

6. Performance Study The following empirical study examines the

behavior of the TW-SMIP protocol under different asymmetric and symmetric processor loads. Comparison of our proposed protocol with traditional Time Warp mechanisms is provided to quantify improvement of a straggler message identification mechanism and validate its utility. The following experiments were performed on dual core 3.2GHz Intel Xeon processors with 6GB of memory per node. GNU/RedHat Linux running a 64-bit 2.6.9 kernel was installed on each machine. Nodes were interconnected via Fast Ethernet. Fourteen of these nodes were used in the following tests. The PNB-SMIP implementation was used across all runs.

In order to analyze the performance of the TW-SMIP implementation, a test case based on an electric power distribution system [23] was used. In this simulation program, each node acts as a source and generates two types of messages: self and propagating. Self-messages are those sent by a source node to itself with a defined timestamp increment. Propagating messages are sent to other nodes in the network. Messages are generated with a timestamp of TLocalTime + LLookahead . The probability of sending a message to other LPs is based on the result of a random function similar to a coin toss. Based on the binary result from the random function, the LP either sends a message to itself or a neighbor. Each LP communicates with other LPs as shown in Figure 4. In the following test cases 1000 LPs are mapped to a single processor.

t4 t1 t3

LPj

LPi

t4 - t2

t2

HB

4545455353

Figure 4. An example network topology

To analyze TW-SMIP over asymmetric conditions, a series of experiments were performed by varying HB frequency and workload. HB frequency is the time after which HB messages are generated by each LP. Asymmetric test conditions are achieved by varying the workload across the pool of machines used to gather data. Processors may be lightly loaded or heavily loaded. The background jobs are generated using a tool called Stress. Stress is a workload generator for POSIX systems [24], and allows for a configurable amount of CPU, memory, I/O, and disk stress on the system. Scenarios termed lightly loaded denote a load of two-CPU bound, one I/O bound, and one memory allocator process. Highly loaded scenarios include a load contains four CPU-bound, two I/O bound and one memory allocator process. For the following graphs, HB Frequency denotes the heart beat message frequency in seconds.

Figure 5 shows runtime characteristics for the lightly loaded scenario. The committed event rate remains constant between all HB frequency trials. The committed event rate varies between 0.5 to 1.0 million. Under a lightly loaded uniform scenario, the processing speed remains relatively constant between processors incurring low rollbacks.

In the next scenario, the background load is non-uniform, i.e., some machines have lightly loaded background processes where others do not. Figure 6 shows only small variations in the event and rollback rates. Even with high HB frequencies, the rollback rate does not vary significantly from lower HB frequencies.

A variety of other experiments were performed. However, due to space limitation, we omit these results. Figure 7 shows a comparison of efficiency (ratio of committed events to total events processed) for several different background loads. Not surprisingly, non-uniformly distributed loads yield more rollbacks and reduced efficiency.

Figure 5. Lightly loaded uniform scenario

Figure 6. Lightly loaded non-uniform scenario

Figure 7. Efficiency comparison between loads

A comparison between TW-SMIP and a traditional Time Warp synchronization mechanism is shown in Figure 8. A mix of uniform and non-uniform lightly and heavily loaded conditions was used for the comparative study. In each test case, the HB frequency that yielded the best performance was used. In the lightly loaded uniform test case, the TW-SMIP approach provides a significantly improved efficiency over a traditional Time Warp approach. Under non-uniform test cases, the observed data also shows significant performance improvements in both the lightly loaded and heavily loaded scenarios. For

4646465454

example, TW-SMIP exhibits an efficiency of 91% in the lightly loaded uniform test case, compared to the traditional TW implementation that yields 76% efficiency.

Figure 8. TW-SMIP Efficiency vs. Traditional Time

Warp Synchronization

7. Conclusions and Future Work Cloud computing offers the promise of providing

an execution platform without exposing complicated details of PDES execution to users. However, it is well known that optimistic PDES programs under traditional Time Warp frameworks can perform poorly where resources are shared amongst many users leading to asymmetric and uneven processing. We have shown that running optimistic simulations without any optimism control under asymmetric loads can lead to lower efficiency in execution. In this paper we describe a new protocol, TW-SMIP that specifically addresses asymmetric background loads typically found in cloud computing architectures. This protocol defines dynamic synchronization points for individual LPs based on straggler messages. Handling of these straggler messages can improve efficiency of the system; which in turn leads to improved utilization of resources by lessening the amount of rolled back computation.

This work only represents an initial step in executing PDES codes in cloud computing environments by examining the question of synchronization. There are many avenues for future research. Evaluation of this approach for a much broader range of PDES applications is required. Implementation of simulation application codes in cloud environments has not been studied, and development frameworks and tools are needed. A mechanism for auto tuning the HB frequency to improve the convenience of the Time Warp platform for users is an area that will require further study. The interaction of fault tolerance mechanisms with Time Warp execution, and exploitation of state saving

mechanisms within Time Warp itself for fault tolerance purposes merit further investigation. Acknowledgement

Funding for this research was provided in part by NSF Grant ATM-0326431. References 1. Fujimoto, R., Parallel and Distributed

Simulation Systems. 2000: Wiley Interscience.

2. Fujimoto, R.M., et al. Large-Scale Network Simulation -- How Big? How Fast? in Modeling, Analysis and Simulation of Computer and Telecommunication Systems. 2003.

3. Lendermann, P., et al. An Integrated and Adaptive Decision-Support Framework for High-Tech Manufacturing and Service Networks. in Proceedings of the 2005 Winter Simulation Conference. 2005.

4. Perumalla, K.S. A Systems Approach to Scalable Transportation Network Modeling. in Winter Simulation Conference. 2006. Monterey, CA: IEEE.

5. Jefferson, D., Virtual Time. ACM Transactions on Programming Languages and Systems, 1985. 7(3): p. 404-425.

6. C, H., ORGs for Scalable, Robust, Privacy-Friendly Client Cloud Computing. Internet Computing, IEEE, 2008. 12(5): p. 96 - 99.

7. Aymerich, F.M., G. Fenu, and S. Surcis. An Approach to a Cloud Computing Network. in First International Conference on the Applications of Digital Information and Web Technologies, ICADIWT 2008

8. Aymerich, F.M., G. Fenu, and S. Surcis. An Approach to a Cloud Computing Network. in First International Conference on the Applications of Digital Information and Web Technologies, ICADIWT 2008.

9. Guo, H., et al., The Application of Utility Computing and Web-Services to Inventory Optimisation, in Proceedings of the 2005 IEEE International Conference Services Computing (SCC’05). 2005. p. 185-191.

10. Keahey, K., K. Doering, and I. Foster. From sandbox to playground: dynamic virtual environments in the grid. in

4747475555

Proceedings of the 5th International Workshop on Grid Computing. 2004.

11. OpenNebula Project. Apr. 2008; Available from: http://www.opennebula.org.

12. Lizhe, W., et al. Scientific Cloud Computing: Early Definition and Experience. in 10th IEEE International Conference on High Performance Computing and Communications, HPCC. 2008.

13. Carothers, C.D. and R.M. Fujimoto, Efficient Execution of Time Warp Programs on Heterogeneous, NOW Platforms. IEEE Transactions on Parallel and Distributed Systems, 2000. 11(3): p. 299-317.

14. England, C.C.R.M.F.P., The Effect of Communication Overheads on Time Warp Performance, in Workshop on Parallel and Distributed Simulation. 1994.

15. Sokol, L.M., D.P. Briscoe, and A.P. Wieland;. MTW: a Strategy for Scheduling Discrete Simulation Events for Concurrent Execution. in Proc. of the SCS Multiconference on Distributed Simulation. 1988.

16. Steinman, J.S. Breathing Time Warp. in PADS '93: Proceedings of the seventh workshop on Parallel and distributed simulation. 1993: ACM.

17. Ball, D. and S. Hoyt. The Adaptive Time-Warp Concurrency Control Algorithm. in Proceedings of the SCS Multiconference on Distributed Simulation, Simulation Series, Society for Computer Simulation. 1990. San Diego, California.

18. Park, A. and R.M. Fujimoto, Aurora: An Approach to High Throughput Parallel Simulation, in PADS '06: Proceedings of the 20th Workshop on Principles of Advanced and Distributed Simulation. May 2006, IEEE Computer Society.

19. Park, A. and R. Fujimoto. A scalable framework for parallel discrete event simulations on desktop grids. in 2007 8th IEEE/ACM International Conference on Grid Computing, . Sep 2007.

20. Chen, D., S.J. Turner, and W. Cai. A Framework for Robust HLA-based Distributed Simulations. in International Workshop on Principles of Advanced and Distributed Simulation. 2006. Singapore.

21. Hadoop. http://www.hadoop.apache.org. 22. Dean, J. and S. Ghemawat, MapReduce:

simplified data processing on large clusters. Communications of the ACM, Jan 2008. 51(1).

23. Madisetti, V.K., D.A. Hardaker, and R.M. Fujimoto, The MIMDIX Operating System for Parallel Simulation and Supercomputing. Journal of Parallel and Distributed Computing, 1993. 18(4): p. 473-483.

24. Stress Library. Available from: http://weather.ou.edu/~apw/projects/stress.

4848485656

Recommended