31
Distributed Simulation and Real Time Applications 2009 October 25, 28 - Singapore Benchmarking Memory Benchmarking Memory Management Management Capabilities within Capabilities within ROOT-Sim ROOT-Sim Roberto Vitali, Alessandro Pellegrini, Francesco Quaglia

Distributed Simulation and Real Time Applications 2009 October 25, 28 - Singapore Benchmarking Memory Management Capabilities within ROOT-Sim Roberto Vitali,

Embed Size (px)

Citation preview

Page 1: Distributed Simulation and Real Time Applications 2009 October 25, 28 - Singapore Benchmarking Memory Management Capabilities within ROOT-Sim Roberto Vitali,

Distributed Simulation and Real Time Applications 2009 October 25, 28 - Singapore

Benchmarking Memory Benchmarking Memory Management Capabilities Management Capabilities

within ROOT-Simwithin ROOT-SimRoberto Vitali, Alessandro Pellegrini, Francesco Quaglia

Page 2: Distributed Simulation and Real Time Applications 2009 October 25, 28 - Singapore Benchmarking Memory Management Capabilities within ROOT-Sim Roberto Vitali,

Benchmarking Memory Management Capabilities within ROOT-Sim2

Motivations for the Work

• We have designed and implemented a fully featured Memory Management Subsystem for optimistic PDES platforms (Di-DyMeLoR [PADS 2009]): Targeted at C-based platforms hosted by CISC

architectures (i386, x86-64) Capable of supporting incremental logging with arbitrary

granularity (based on transparent code instrumentation) Which allows Simulation Objects’ Memory Maps to

dynamically change, via standard malloc/free services

• In this work we provide accurate benchmarking results for assessing the effectiveness of such a subsystem This entails definition and implementation of an adequate

benchmark application

Page 3: Distributed Simulation and Real Time Applications 2009 October 25, 28 - Singapore Benchmarking Memory Management Capabilities within ROOT-Sim Roberto Vitali,

Benchmarking Memory Management Capabilities within ROOT-Sim3

Motivations for the Work (2)

• The significance of this study is in that literature benchmarking results for Memory Management Subsystems with incremental capabilities:

Did not cope with dynamic memory mapsHave only been targeted at RISC systems (no

complex instruction sets coverage)They are about 10 years old (no coverage of current

technological treds)

Page 4: Distributed Simulation and Real Time Applications 2009 October 25, 28 - Singapore Benchmarking Memory Management Capabilities within ROOT-Sim Roberto Vitali,

Benchmarking Memory Management Capabilities within ROOT-Sim4

Objectives

• Show what is the actual overhead - due to memory update tracking mechanisms - added to the execution in a parallel and distributed optimistic simulation environment relying on current technological trends.

• Develop an effective benchmark to assess the performance of dynamic Memory Management Subsystems, since no such benchmark exists in our context.

Page 5: Distributed Simulation and Real Time Applications 2009 October 25, 28 - Singapore Benchmarking Memory Management Capabilities within ROOT-Sim Roberto Vitali,

Benchmarking Memory Management Capabilities within ROOT-Sim5

Work Path

• The most widely known benchmark for PDES Systems is PHOLD

• Traditionally, it has been used to evaluate PDES platforms as a whole (e.g., to evaluate the effects of the selected Synchronization Scheme)

• We have provided extra specifications to PHOLD in order to explicitly cope with the evaluation of memory management capabilities in Optimistic Systems

• The implementation reflects standard libraries’ code in the execution of memory access tasks: e.g., writing in contiguous memory regions is performed by

exploiting string instructions, such as stos.

Page 6: Distributed Simulation and Real Time Applications 2009 October 25, 28 - Singapore Benchmarking Memory Management Capabilities within ROOT-Sim Roberto Vitali,

Benchmarking Memory Management Capabilities within ROOT-Sim6

Reference Operating Architecture

• Rome Optimistic Simulator (ROOT-Sim):Based on ANSI-C/POSIX technology and the MPI

standardTransparent support of housekeeping operations

typical of optimistic simulation environments (e.g., objects mapping and scheduling)

Based on the notion of event handlers and event injection services

Page 7: Distributed Simulation and Real Time Applications 2009 October 25, 28 - Singapore Benchmarking Memory Management Capabilities within ROOT-Sim Roberto Vitali,

Benchmarking Memory Management Capabilities within ROOT-Sim7

Reference Operating Architecture (2)

MemoryManagementSubsystem

Page 8: Distributed Simulation and Real Time Applications 2009 October 25, 28 - Singapore Benchmarking Memory Management Capabilities within ROOT-Sim Roberto Vitali,

Benchmarking Memory Management Capabilities within ROOT-Sim8

Memory Management Subsystem

• Dynamic Memory Logger and Restorer (DyMeLoR):Based on ANSI-C wrapped malloc/free servicesProvides log/restore facilities of dynamic memory based

objects’ states transparently towards the application-level programmer

Supports dynamic memory chunks’ contiguity for a same object

Allocation/deallocation operations are guaranteed to be Piece-Wise-Deterministic

Page 9: Distributed Simulation and Real Time Applications 2009 October 25, 28 - Singapore Benchmarking Memory Management Capabilities within ROOT-Sim Roberto Vitali,

Benchmarking Memory Management Capabilities within ROOT-Sim9

Memory Management Subsystem (2)

• Based on Static Software Instrumentation:Compile-time disassembling and rewriting of the

application level executable generated by standard compilers (e.g., gcc)

Transparent injection of memory-access tracing routines

Disassembling data are cached into compile-time generated tables, to reduce memory-access tracing overhead

Page 10: Distributed Simulation and Real Time Applications 2009 October 25, 28 - Singapore Benchmarking Memory Management Capabilities within ROOT-Sim Roberto Vitali,

Benchmarking Memory Management Capabilities within ROOT-Sim10

Memory Management Subsystem (3)

Statically inserts calls to the update trackingroutine – generates data tables

Traces the execution of those instructionsinvolving a memory update

Keeps track of intra-checkpoint memory updates

Allows faster interception of memory updates

Page 11: Distributed Simulation and Real Time Applications 2009 October 25, 28 - Singapore Benchmarking Memory Management Capabilities within ROOT-Sim Roberto Vitali,

Benchmarking Memory Management Capabilities within ROOT-Sim11

Parser / Modifier

• By parsing the application-level code’s byte stream, it identifies all those instructions involving a memory write

• Preposes to them a call to the update_tracker module

• Extracts all relevant information

• Builds the Disassembling Table

• Corrects all the static and dynamic references.

Page 12: Distributed Simulation and Real Time Applications 2009 October 25, 28 - Singapore Benchmarking Memory Management Capabilities within ROOT-Sim Roberto Vitali,

Benchmarking Memory Management Capabilities within ROOT-Sim12

update_tracker

• Written in Assembly language to optimize performance

• When called, it exploits the Disassembling Table generated at compile time together with the CPU registers to compute the actual memory-write destination address

• Triggers the Memory Map Manager to keep track of the new memory update

Page 13: Distributed Simulation and Real Time Applications 2009 October 25, 28 - Singapore Benchmarking Memory Management Capabilities within ROOT-Sim Roberto Vitali,

Benchmarking Memory Management Capabilities within ROOT-Sim13

Memory Map Manager• For each simulation object a meta data table is

mantained (different entries handle different chunk sizes, in between 32B - 32KB)

• Each entry keeps information about a block of contiguous preallocated memory chunks

• Block structures include status bitmap – to keep track of allocated chunks – and dirty bitmap – to keep track of updated chunks

malloc_area

malloc_area

base_state_address

state_layout_info

statusbitmap

dirtybitmap

chunk

chunk

preallocated blockof contiguous chunks

Page 14: Distributed Simulation and Real Time Applications 2009 October 25, 28 - Singapore Benchmarking Memory Management Capabilities within ROOT-Sim Roberto Vitali,

Benchmarking Memory Management Capabilities within ROOT-Sim14

Memory Map Manager (2)

• When triggered, it matches the area and the chunks involved by the write operation

• Marks involved chunks as dirty, and updates all the relevant meta data

• All those operation concerning areas outside the object’s state (e.g., global variables) are simply discarded

Page 15: Distributed Simulation and Real Time Applications 2009 October 25, 28 - Singapore Benchmarking Memory Management Capabilities within ROOT-Sim Roberto Vitali,

Benchmarking Memory Management Capabilities within ROOT-Sim15

Memory Map Manager (3)• Incremental State Log Operations:

A log operation results in packing the information to be logged into a contiguous memory buffer

A malloc_area, together with its Status Bitmap, is only copied if it was updated since the last log/restore operation

Dirty Bitmaps are only copied if at least one chunk has been updated since the last log/restore operation

Every dirty chunk is also copied into the log bufferPeriodically, a full snapshot of the state is takenLogs are organized into a chain, ordered with respect

to the Logical Simulation Time they were taken at

Page 16: Distributed Simulation and Real Time Applications 2009 October 25, 28 - Singapore Benchmarking Memory Management Capabilities within ROOT-Sim Roberto Vitali,

Benchmarking Memory Management Capabilities within ROOT-Sim16

Memory Map Manager (4)

• Incremental Restore State Operations:When a Restore Operation needs to be executed at

simulation time T, the log chain is backward traversed to determine the most recent log with timestamp less than or equal to T

The Restore Operation is performed with an iterative procedure which scans the logs along the chain

The operation halts whenever the memory map is completely restored (i.e., when a full log in encountered)

Page 17: Distributed Simulation and Real Time Applications 2009 October 25, 28 - Singapore Benchmarking Memory Management Capabilities within ROOT-Sim Roberto Vitali,

Benchmarking Memory Management Capabilities within ROOT-Sim17

The Benchmark• The Benchmark is derived from PHOLD:

Fictious events are executed, involving the advancement of the local simulation clock

Upon event execution, a new event is scheduled, destined to whatever object in the system

Each Simulation Object’s state contains a set of N pointers for accessing N

distinct linked lists of buffers

relying on dynamic memory

allocation

Page 18: Distributed Simulation and Real Time Applications 2009 October 25, 28 - Singapore Benchmarking Memory Management Capabilities within ROOT-Sim Roberto Vitali,

Benchmarking Memory Management Capabilities within ROOT-Sim18

The Benchmark (2)• Different lists keep track of buffers with different sizes, in between

a min and max• Denoting as size(i) the exact size of the buffers inside the i-th list,

at setup time the S bytes of the state are allocated according to the following rule:

bytes are destined for buffer allocation inside each list

buffers are allocated for the i-th list, and linked together

• There is a bias towards the number of buffers associated with smaller sizes, to mimic the common scenario where applications tend to allocate a large number of small buffers

N

S

)(isize

NS

Page 19: Distributed Simulation and Real Time Applications 2009 October 25, 28 - Singapore Benchmarking Memory Management Capabilities within ROOT-Sim Roberto Vitali,

Benchmarking Memory Management Capabilities within ROOT-Sim19

The Benchmark (3)• The benchmark logic provides two events:

BUFFER_ALLOCATE[size]: upon its execution, a new buffer is allocated and linked to the i-th local list, associated with size(i) = size

BUFFER_DEALLOCATE[null]: upon its execution at time t:

• A size value is randomly selected from the pool of size(i) possibilities

• A random buffer in the list associated with size(i)=size (if any) gets released

• A new BUFFER_ALLOCATE[size] event is scheduled for whatever simulation object, at the same time t

• A new BUFFER_DEALLOCATE[null] event is scheduled for the same simulation object, at time t + inc

Page 20: Distributed Simulation and Real Time Applications 2009 October 25, 28 - Singapore Benchmarking Memory Management Capabilities within ROOT-Sim Roberto Vitali,

Benchmarking Memory Management Capabilities within ROOT-Sim20

The Benchmark (4)

• The differentiation in the two types of events implies that we are migrating buffers across the different simulation objects, with exponentially distributed migration rate

• At each simulation time, the total memory used by the simulation objects is constant, thus reflecting the specific space complexity of the simulation model for which the benchmark configuration is the current mimic

Page 21: Distributed Simulation and Real Time Applications 2009 October 25, 28 - Singapore Benchmarking Memory Management Capabilities within ROOT-Sim Roberto Vitali,

Benchmarking Memory Management Capabilities within ROOT-Sim21

The Benchmark (5)

• Read/write accesses into the objects states’ buffers have been associated to the execution of the fictious events

• The benchmark is able to emulate read vs write intensive application:The more write intensive the event, the larger is the

number of chunks updatedThis allows to observe how the costs of memory-write

tracking and log/restore operations scale vs ROOT-Sim implementation

Page 22: Distributed Simulation and Real Time Applications 2009 October 25, 28 - Singapore Benchmarking Memory Management Capabilities within ROOT-Sim Roberto Vitali,

Benchmarking Memory Management Capabilities within ROOT-Sim22

The Benchmark (6)

• An additional parameter (x ≤ S) indicates the total amount of bytes to be read/written: A breadth-first visit on the lists has been adopted:

• When executing an event we randomly select a list to start the visit from

• All the content in the buffer at the head of the list is touched in read/write mode

• Other lists are accessed according to a circular policy, moving to the next buffers on subsequent accesses

• Until exactly x bytes have been touched

• The breadth-first visit mimics a worst case scenario for log/restore facilities offered by ROOT-Sim: write operations are not localized into a few malloc_areas

Page 23: Distributed Simulation and Real Time Applications 2009 October 25, 28 - Singapore Benchmarking Memory Management Capabilities within ROOT-Sim Roberto Vitali,

Benchmarking Memory Management Capabilities within ROOT-Sim23

Measures Performed

• Test Platform: Quad-Core machine equipped with four 2.4GHz/4MB-Cache 64-

bits Intel processors 4 GB of RAM memory One ROOT-Sim simulation kernel per processor

• Four simulation objects (one per core)

• Performed tests require each simulation object to execute at least 10.000 buffer allocations, scattered over 8 different buffer chains with sizes ranging from 32 Bytes to 4KB

• The parameter x has been varied in order to generate read/write operations spanning in between 20%-80% of the whole size of the simulation object

Page 24: Distributed Simulation and Real Time Applications 2009 October 25, 28 - Singapore Benchmarking Memory Management Capabilities within ROOT-Sim Roberto Vitali,

Benchmarking Memory Management Capabilities within ROOT-Sim24

Measures Performed (2)• Event Latency, Checkpoint Latency, Restore Latency

and Memory Usage (per checkpoint) have been measured

• Different interleaving steps between full and incremental logs have been selected, taking full logs every 5 or every 20 log operations.

• Similar measurements have been performed by excluding software instrumentation and related incremental log capabilities:By linking a previous memory map manager, with a

similar structure except for that no memory-write tracking is supported

Page 25: Distributed Simulation and Real Time Applications 2009 October 25, 28 - Singapore Benchmarking Memory Management Capabilities within ROOT-Sim Roberto Vitali,

Benchmarking Memory Management Capabilities within ROOT-Sim25

Experimental Results: Events• The tracking mechanisms used to identify regions

involved in update operations add an overhead to the event execution

• Nevertheless, this overhead is relatively limited, up to 40% spanning of write operations

• When the state increases in size, it gets relatively reduced

10 KB 100 KB 1024 KB

Page 26: Distributed Simulation and Real Time Applications 2009 October 25, 28 - Singapore Benchmarking Memory Management Capabilities within ROOT-Sim Roberto Vitali,

Benchmarking Memory Management Capabilities within ROOT-Sim26

Experimental Results: Log• The event processing overhead of the instrumented

software is moreover counterbalanced by reduced checkpoint latency

• The capability for such a checkpoint has a great relevance in scenarios with applications being not Piece-Wise-Deterministic

10 KB 100 KB 1024 KB

Page 27: Distributed Simulation and Real Time Applications 2009 October 25, 28 - Singapore Benchmarking Memory Management Capabilities within ROOT-Sim Roberto Vitali,

Benchmarking Memory Management Capabilities within ROOT-Sim27

Experimental Results: Restore• The non-instrumented configuration typically

provides gains in state restore operations

• State restore latency directly depends on the interleaving between full logs and incremental logs along the log chain

• The performance decrease can be controlled via proper selection of a non-oversized interleaving step

10 KB 100 KB 1024 KB

Page 28: Distributed Simulation and Real Time Applications 2009 October 25, 28 - Singapore Benchmarking Memory Management Capabilities within ROOT-Sim Roberto Vitali,

Benchmarking Memory Management Capabilities within ROOT-Sim28

Experimental Results: Memory• Memory requirements for each log operation in the

instrumented case are definitely lower than those observed for non-instrumented software

• This further strengths the capabilities of the fully featured incremental version of the software in case of applications with very large memory requirements for the objects’ states

10 KB 100 KB 1024 KB

Page 29: Distributed Simulation and Real Time Applications 2009 October 25, 28 - Singapore Benchmarking Memory Management Capabilities within ROOT-Sim Roberto Vitali,

Benchmarking Memory Management Capabilities within ROOT-Sim29

Summary

• We have developed a synthetic benchmark to assess the memory management capabilities offered by the optimistic parallel simulation environment ROOT-Sim (based on C technology)

• We have focused on incremental log/restore aspects and on software instrumentation techniques

• This has been done to valuate the efficiency and effectiveness of supports to high performance simulation systems, which are important in contexts with, e.g, temporal constraints

• The targeted system is representative of platforms hosted by modern CISC machines

Page 30: Distributed Simulation and Real Time Applications 2009 October 25, 28 - Singapore Benchmarking Memory Management Capabilities within ROOT-Sim Roberto Vitali,

Benchmarking Memory Management Capabilities within ROOT-Sim30

Planned future work

• Evaluation of ROOT-Sim with differentiated application programming patternsUse of large spectrum of simulation models from the

real worldUse of unoptimized vs optimized machine code for

memory read and write operations (tradeoffs between programmer skills vs compiler automatic optimizations)

Page 31: Distributed Simulation and Real Time Applications 2009 October 25, 28 - Singapore Benchmarking Memory Management Capabilities within ROOT-Sim Roberto Vitali,

Benchmarking Memory Management Capabilities within ROOT-Sim31

Thanks!!

Questions?