Optimizing Memory Management for Optimistic Simulation ...pellegrini/talks/hpcs16.pdfLPj LPk Execution Time Execution Time Execution Time 4 of 18 - Optimizing Memory Management for

Optimizing Memory Management for Optimistic Simulationwith Reinforcement Learning

Alessandro Pellegrini

Sapienza, University of Rome

HPCS 2016

Context

• Simulation is a powerful technique to explore complex scenarios

• Parallel Discrete Event Simulation (PDES) has been applied to alarge set of research fields

• Speculative Simulation (Time Warp-based) is proven to beeffective to deliver high performance simulations

• Ensuring consistency of speculative simulation requires effort

• Transparency towards the application-model developer is critical

2 of 18 - Optimizing Memory Management for Optimistic Simulation with Reinforcement Learning

Organization of a Time Warp Kernel

Communication Network

Machine

CPU

Kernel

LPLP

LP LPLP

LP LPLP

LP LPLP

LP

...

...

CPU CPU CPU

Machine

CPU

Kernel

...CPU CPU CPU

Kernel


The Synchronization Problem

LPi

LPj

LPk Execution Time

Execution Time

Execution Time



LPi

LPj 15

5

LPk Execution Time7

10 Execution Time

Execution Time



LPi

LPj 15

5

LPk Execution Time7 17

10

17

Execution Time

Execution Time



LPi

LPj 15

5 10

20

LPk Execution Time7 17 25

10

17

Execution Time

Execution Time



LPi

LPj 15

5 10

20

12


10

17

Execution Time

Execution Time



LPi

LPj 15

5 10

20

Straggler Message

12


10

17

Rollback Execution:

Recovering state at

LVT 10

Execution Time

Execution Time



LPi

LPj 15

5 10

20

Straggler Message

12


10

17 17

Anti-message

anti-message

reception

Rollback Execution:

Recovering state at

LVT 10

Rollback Execution:

Recovering State at

LVT 7

Execution Time

Execution Time



LPi

LPj 15

5 10

20 12

Straggler Message

12


10

25

17 17

Anti-message

anti-message

reception

Rollback Execution:

Recovering state at

LVT 10

Rollback Execution:

Recovering State at

LVT 7

Execution Time

Execution Time


Memory Management and Rollbacks

• How can a runtime environment restore a state?

• It has to know the complete memory map of each LP

• It should take “sometimes” a snapshot of that map

• The snapshot could be either full or incremental

• Memory management is fundamental to Time Warp systems◦ Too many snapshots: memory/latency inefficiency◦ Too few: rollbacks are long!◦ Full vs incremental: how to decide?


Memory Management and Rollbacks

• How can a runtime environment restore a state?

• It has to know the complete memory map of each LP

• It should take “sometimes” a snapshot of that map

• The snapshot could be either full or incremental

• Memory management is fundamental to Time Warp systems◦ Too many snapshots: memory/latency inefficiency◦ Too few: rollbacks are long!◦ Full vs incremental: how to decide?


Our memory management organization

• Transparency◦ Interception of memory-related operations (no platform APIs)◦ No application-level procedure for (incremental) log/restore tasks

• Optimism-Aware Runtime Supports◦ Recoverability of generic memory operations: allocation, deallocation,

and updating

• Incrementality◦ Cope with memory “abuse” of speculative rollback-based

synchronization schemes◦ Enhance memory locality



• Lightweight software instrumentation◦ Optimized memory-write access tracing and logging◦ Arbitrary-granularity memory-write tracing◦ Concentration of most of the instrumentation tasks at a pre-running

stage:

• No costly runtime dynamic disassembling

• Standard API wrappers◦ Code can call standard malloc services◦ Memory map transparently managed by the simulation platform



malloc_area

malloc_area

... Block status

bitmap

Dirty

bitmap

chunk

chunk

...

• Memory (for each LP) is pre-allocated

• Requests are served on a chunk basis

• Explicit avoidance of per-chunk metadata◦ Block status bitmap: tracks used chunks◦ Dirty bitmap: tracks updated chunks since last log



• We use Hijacker to track memory-update instructions

mov $3, xoriginal memory

update

jmp *%eaxindirect branch

mov $3, x

jmp .Jump

call track

jmp 0xXXXX.Jump:

push struct

new writeable section regular jump modi ed

by branch_corrector

call corrector

Instrumentation Process

Original Executable Final executable

push struct

regular jump

• Multi-code packs two different version of the program9 of 18 - Optimizing Memory Management for Optimistic Simulation with Reinforcement Learning

Memory management self-optimization

• To optimize the memory manager we have to determine:◦ When to take a snapshot◦ Its mode (incremental vs incremental)

• But to take an incremental log, tracing must be active

• Traditional approaches are based on analytic models◦ Periodic recomputation (e.g., checkpoint interval)◦ Non-responsive if dynamics change fast◦ Might not capture secondary effects

• We use reinforcement learning


Memory management self-optimization

• To optimize the memory manager we have to determine:◦ When to take a snapshot◦ Its mode (incremental vs incremental)

• But to take an incremental log, tracing must be active

• Traditional approaches are based on analytic models◦ Periodic recomputation (e.g., checkpoint interval)◦ Non-responsive if dynamics change fast◦ Might not capture secondary effects

• We use reinforcement learning


Reinforcement Leraning-based self-optimization

• An agent takes an action in the environment depending on thecurrent state

• An a-posteriori reward tells whether the choice was good

• Previous decisions affect future ones (we learn from history!)

• With some random probability, we ignore history and explore

• We take an action after the execution of each event

• After some knowledge has been acquired, the system can becomevery responsive


States and actions

Actions: Monitored, Unmonitored, Checkpoint


The reward

• We want to reduce the time spent in non-necessary tasks

• We define the expected computation loss:

Γ = E[∫ T

0X (t)dt

]• where:

X (t) =

0 if x = Non-IncrementalδM

(δe+δM) if x = Incremental

1 − γ if x = CKPTI

1 if x = CKPTF

1 if x = Rollback


Experimental Setup

• 64-bit NUMA machine, 24 cores, 32GB of RAM

• SuSe Enterprise 11, Linux 2.6.32.13

• GSM coverage simulation model

• High fidelity model (fading, power regulation, meteorologicalconditions)

• Ring highway coverage

• 1000 channels per cell

• Variable call interarrival (simulation of one week of traffic)


Experimental Results

0

50000

100000

150000

200000

250000

10000 30000 50000 70000

Cum

ulat

ed C

omm

itted

Eve

nts

Wall-clock-time (sec)

Overall Execution Speed

Q-LearningIncremental

Full



0

20000

40000

60000

80000

100000

120000

RL Genetic

Total execution time



0

5

10

15

20

25

30

35

40

200 400 600 800 1000Eve

nts

Com

mitt

ed b

etw

een

Che

ckpo

ints

Logical Processes

Checkpoint Interval


Thanks for your attention

Questions?

[email protected]

http://www.dis.uniroma1.it/∼pellegrini

https://github.com/HPDCS/ROOT-Sim


Documents

Optimizing Memory Management for Optimistic Simulation ...pellegrini/talks/hpcs16.pdfLPj LPk Execution Time Execution Time Execution Time 4 of 18 - Optimizing Memory Management for