47
A Framework for Simulation-Based Online Planning Martin Wirsing In Kooperation mit Lenz Belzner und Rolf Hennicker, FACS 2015, LNCS 9539, 2015, 1-30 Modellierung Dynamischer und Adaptiver Systeme WS 2018/19 1

A Framework for Simulation-Based Online PlanningA Framework for Simulation-Based Online Planning Martin Wirsing In Kooperation mit Lenz Belzner und Rolf Hennicker, FACS 2015, LNCS

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: A Framework for Simulation-Based Online PlanningA Framework for Simulation-Based Online Planning Martin Wirsing In Kooperation mit Lenz Belzner und Rolf Hennicker, FACS 2015, LNCS

A Framework for

Simulation-Based Online Planning

Martin Wirsing

In Kooperation mit Lenz Belzner und Rolf Hennicker,

FACS 2015, LNCS 9539, 2015, 1-30

Modellierung Dynamischer und Adaptiver Systeme

WS 2018/19

1

Page 2: A Framework for Simulation-Based Online PlanningA Framework for Simulation-Based Online Planning Martin Wirsing In Kooperation mit Lenz Belzner und Rolf Hennicker, FACS 2015, LNCS

Autonomous Systems

Autonomous systems

have to adapt to

environmental conditions and

new requirements

at runtime even if they are defined at design time

ASCENS project

2010-2015, EU-funded Integrated Project

15 partners from 7 countries

Developed systematic approach for

engineering autonomous ensembles including

SW process, formal modeling, verification,

monitoring, adaptation, awareness

Case studies on

robotics, cloud computing, e-mobility

2

Page 3: A Framework for Simulation-Based Online PlanningA Framework for Simulation-Based Online Planning Martin Wirsing In Kooperation mit Lenz Belzner und Rolf Hennicker, FACS 2015, LNCS

Decision Making under Uncertainty

Very large state spaces 𝑆 > 1010

Probabilistic effects

Partially uncontrolled environment

Incomplete design time knowledge

3

Page 4: A Framework for Simulation-Based Online PlanningA Framework for Simulation-Based Online Planning Martin Wirsing In Kooperation mit Lenz Belzner und Rolf Hennicker, FACS 2015, LNCS

Contents

1. Online planning

2. A generic framework for online planning

3. Simulation-based online planning

1. The framework

2. Monte Carlo Tree Search for discrete domains

3. Cross Entropy for continuous domains:

4. Concluding remarks

4

Page 5: A Framework for Simulation-Based Online PlanningA Framework for Simulation-Based Online Planning Martin Wirsing In Kooperation mit Lenz Belzner und Rolf Hennicker, FACS 2015, LNCS

1. Online Planning

5

Page 6: A Framework for Simulation-Based Online PlanningA Framework for Simulation-Based Online Planning Martin Wirsing In Kooperation mit Lenz Belzner und Rolf Hennicker, FACS 2015, LNCS

Online Planning

Real Situation

build State Model

and plan

observe

execute

Image sources:

thegrid.soup.io/post/312159914

mobots.epfl.ch/marxbot.html

6

Page 7: A Framework for Simulation-Based Online PlanningA Framework for Simulation-Based Online Planning Martin Wirsing In Kooperation mit Lenz Belzner und Rolf Hennicker, FACS 2015, LNCS

Online Planning

Real Situation

build State Model

and plan

observe

execute

Image sources:

thegrid.soup.io/post/312159914

mobots.epfl.ch/marxbot.html

7

Page 8: A Framework for Simulation-Based Online PlanningA Framework for Simulation-Based Online Planning Martin Wirsing In Kooperation mit Lenz Belzner und Rolf Hennicker, FACS 2015, LNCS

Online Planning (Informally, Sequential)

while true do

observe state

plan

execute action w.r.t. plan

end while

8

Page 9: A Framework for Simulation-Based Online PlanningA Framework for Simulation-Based Online Planning Martin Wirsing In Kooperation mit Lenz Belzner und Rolf Hennicker, FACS 2015, LNCS

Online Planning (Informally, Concurrent)

while true do

observe state

execute || plan

end while

9

Page 10: A Framework for Simulation-Based Online PlanningA Framework for Simulation-Based Online Planning Martin Wirsing In Kooperation mit Lenz Belzner und Rolf Hennicker, FACS 2015, LNCS

Online Planning: Parameters

State space 𝑆

Action space 𝐴

Attribute 𝑎𝑐𝑡𝑖𝑜𝑛𝑅𝑒𝑞𝑢𝑖𝑟𝑒𝑑 ∶ 𝐴𝑔𝑒𝑛𝑡 → 𝐵𝑜𝑜𝑙

Operation 𝑜𝑏𝑠𝑒𝑟𝑣𝑒 ∶ 𝐴𝑔𝑒𝑛𝑡 → 𝑆

Operation execute : RealAction → ()

Planning

Reward function 𝑅 ∶ 𝑆 → ℝ => getReward

Strategy 𝑃𝐴𝑐𝑡𝑖𝑜𝑛 𝐴 𝑆) => sampleAction Planning refines initial strategy according to 𝑅

Online planning

Iterated execution and planning

10

Page 11: A Framework for Simulation-Based Online PlanningA Framework for Simulation-Based Online Planning Martin Wirsing In Kooperation mit Lenz Belzner und Rolf Hennicker, FACS 2015, LNCS

Online Planning (Refined)

while true do

state ← observe()

planner.state ← state

when actionRequired do

actionRequired ← false

action ← planner.strategy.sampleAction(state)

end when

action.real.execute()

end while

while true do

plan()

end while

Agent

Planner

Agent || Planner where

11

Page 12: A Framework for Simulation-Based Online PlanningA Framework for Simulation-Based Online Planning Martin Wirsing In Kooperation mit Lenz Belzner und Rolf Hennicker, FACS 2015, LNCS

Plug Points

while true do

state ← observe()

planner.state ← state

when actionRequired do

actionRequired ← false

action ← planner.strategy.sampleAction(state)

end when

action.real.execute()

end while

while true do

plan()

end while

Agent

Planner

Agent || Planner where domain

specific

12

Page 13: A Framework for Simulation-Based Online PlanningA Framework for Simulation-Based Online Planning Martin Wirsing In Kooperation mit Lenz Belzner und Rolf Hennicker, FACS 2015, LNCS

Plug Points

while true do

state ← observe()

planner.state ← state

when actionRequired do

actionRequired ← false

action ← planner.strategy.sampleAction(state)

end when

action.real.execute()

end while

while true do

plan()

end while

Agent

Planner

Agent || Planner where

design

choice

domain

specific

13

Page 14: A Framework for Simulation-Based Online PlanningA Framework for Simulation-Based Online Planning Martin Wirsing In Kooperation mit Lenz Belzner und Rolf Hennicker, FACS 2015, LNCS

A Framework for Online Planning

Real Situation

14

Page 15: A Framework for Simulation-Based Online PlanningA Framework for Simulation-Based Online Planning Martin Wirsing In Kooperation mit Lenz Belzner und Rolf Hennicker, FACS 2015, LNCS

A Framework for Online Planning

Operates w.r.t.

state and strategy

𝑷𝑨𝒄𝒕𝒊𝒐𝒏(𝑨|𝑺)

Real Situation

15

Page 16: A Framework for Simulation-Based Online PlanningA Framework for Simulation-Based Online Planning Martin Wirsing In Kooperation mit Lenz Belzner und Rolf Hennicker, FACS 2015, LNCS

𝑷𝑨𝒄𝒕𝒊𝒐𝒏(𝑨|𝑺) Changes strategy

w.r.t. reward function

Real Situation

A Framework for Online Planning

16

Page 17: A Framework for Simulation-Based Online PlanningA Framework for Simulation-Based Online Planning Martin Wirsing In Kooperation mit Lenz Belzner und Rolf Hennicker, FACS 2015, LNCS

3. Simulation-Based Online Planning

17

Page 18: A Framework for Simulation-Based Online PlanningA Framework for Simulation-Based Online Planning Martin Wirsing In Kooperation mit Lenz Belzner und Rolf Hennicker, FACS 2015, LNCS

Three Types of State

Real Situation

State Model

Simulation

18

Page 19: A Framework for Simulation-Based Online PlanningA Framework for Simulation-Based Online Planning Martin Wirsing In Kooperation mit Lenz Belzner und Rolf Hennicker, FACS 2015, LNCS

Approach

Refine strategy 𝑃𝐴𝑐𝑡𝑖𝑜𝑛 𝐴 𝑆 by Simulation-Based

Planning

Provide agent with simulation of itself and domain

Generate simulations of future episodes

Evaluate simulation episodes wrt. reward function

Use estimates to refine simulations

Finally: Execute a real action that performed well in simulation

Repeat

19

Page 20: A Framework for Simulation-Based Online PlanningA Framework for Simulation-Based Online Planning Martin Wirsing In Kooperation mit Lenz Belzner und Rolf Hennicker, FACS 2015, LNCS

3.1 The Framework for

Simulation-Based Planning

20

Page 21: A Framework for Simulation-Based Online PlanningA Framework for Simulation-Based Online Planning Martin Wirsing In Kooperation mit Lenz Belzner und Rolf Hennicker, FACS 2015, LNCS

The Framework for

Simulation-Based Planning

𝑷𝑨𝒄𝒕𝒊𝒐𝒏(𝑨|𝑺)

𝑷𝑺𝒊𝒎(𝑺|𝑺 𝒙 𝑨) 21

Page 22: A Framework for Simulation-Based Online PlanningA Framework for Simulation-Based Online Planning Martin Wirsing In Kooperation mit Lenz Belzner und Rolf Hennicker, FACS 2015, LNCS

𝑷𝑨𝒄𝒕𝒊𝒐𝒏(𝑨|𝑺)

The Framework for

Simulation-Based Planning

changes

𝑷𝑺𝒊𝒎(𝑺|𝑺 𝒙 𝑨) 22

Page 23: A Framework for Simulation-Based Online PlanningA Framework for Simulation-Based Online Planning Martin Wirsing In Kooperation mit Lenz Belzner und Rolf Hennicker, FACS 2015, LNCS

𝑷𝑺𝒊𝒎(𝑺|𝑺 𝒙 𝑨)

𝑷𝑨𝒄𝒕𝒊𝒐𝒏(𝑨|𝑺)

The Framework for

Simulation-Based Planning

Simulate wrt.

strategy and domain

dynamics

23

Page 24: A Framework for Simulation-Based Online PlanningA Framework for Simulation-Based Online Planning Martin Wirsing In Kooperation mit Lenz Belzner und Rolf Hennicker, FACS 2015, LNCS

The Framework for

Simulation-Based Planning

𝑷𝑨𝒄𝒕𝒊𝒐𝒏(𝑨|𝑺)

Simulation result

refines strategy

Weighted by

episode reward

𝑷𝑺𝒊𝒎(𝑺|𝑺 𝒙 𝑨) 24

Page 25: A Framework for Simulation-Based Online PlanningA Framework for Simulation-Based Online Planning Martin Wirsing In Kooperation mit Lenz Belzner und Rolf Hennicker, FACS 2015, LNCS

SBP Parameters

Simulation 𝑃𝑆𝑖𝑚 𝑆 𝑆 𝑥 𝐴)

Agent‘s model/knowledge of domain dynamics

Can be changed at runtime

May differ from real domain dynamics

Can be learned/refined from observations

Maximum search depth ℎ𝑚𝑎𝑥

Impacts simulation effort

Less simulation steps: Fast but shallow planning

Can be dynamically adapted

25

Page 26: A Framework for Simulation-Based Online PlanningA Framework for Simulation-Based Online Planning Martin Wirsing In Kooperation mit Lenz Belzner und Rolf Hennicker, FACS 2015, LNCS

Simulation-Based Planning Algorithm

op plan()

vars s, r, episode, a

s ← state

r ← rewardFct.getReward(s)

episode ← nil

for 0 .. ℎ𝑚𝑎𝑥 do

a ← strategy.sampleAction(s)

s ← simulation.sampleSuccessor(s, a)

episode ← episode::(s, a)

r ← r + rewardFct.getReward(s)

end for

strategy ← updateStrategy(episode, r)

end op

26

Page 27: A Framework for Simulation-Based Online PlanningA Framework for Simulation-Based Online Planning Martin Wirsing In Kooperation mit Lenz Belzner und Rolf Hennicker, FACS 2015, LNCS

Simulation-Based Planning: Plug Points

op plan()

vars s, r, episode, a

s ← state

r ← rewardFct.getReward(s)

episode ← nil

for 0 .. ℎ𝑚𝑎𝑥 do

a ← strategy.sampleAction(s)

s ← simulation.sampleSuccessor(s, a)

episode ← episode::(s, a)

r ← r + rewardFct.getReward(s)

end for

strategy ← updateStrategy(episode, r)

end op

27

Page 28: A Framework for Simulation-Based Online PlanningA Framework for Simulation-Based Online Planning Martin Wirsing In Kooperation mit Lenz Belzner und Rolf Hennicker, FACS 2015, LNCS

Simulation-Based Planning: Variants

Variants define updateStrategy(Episode, Real)

Vanilla Monte Carlo

Genetic Algorithms

Monte Carlo Tree Search

for discrete domains

Cross Entropy Planning

for continuous domains

28

Page 29: A Framework for Simulation-Based Online PlanningA Framework for Simulation-Based Online Planning Martin Wirsing In Kooperation mit Lenz Belzner und Rolf Hennicker, FACS 2015, LNCS

3.2 Monte Carlo Tree Search

for Discrete Domains

Strategy as tree

Nodes represent states and action choices

Add a node per simulation

Aggregate simulation data in nodes

Reward and frequency

Sample actions w.r.t. aggregated data

Cameron B Browne, Edward Powley, Daniel Whitehouse, Simon M Lucas, Peter I Cowling, Philipp

Rohlfshagen, Stephen Tavener, Diego Perez, Spyridon Samothrakis, and Simon Colton. A survey of

monte carlo tree search methods. Computational Intelligence and AI in Games, IEEE Transactions on,

4(1):1 - 43, 2012. 29

Page 30: A Framework for Simulation-Based Online PlanningA Framework for Simulation-Based Online Planning Martin Wirsing In Kooperation mit Lenz Belzner und Rolf Hennicker, FACS 2015, LNCS

Strategy Inside the Tree

E.g. Upper Confidence Bounds for Trees

Treat action selection as multiarmed bandit

Select actions that maximize

𝑈𝐶𝑇𝑗 = 𝑋𝑗 + 2𝐶2 ln 𝑛

𝑛𝑗

4 / 8 0 / 3 7 / 10

2 / 4 5 / 6 1 / 2 1 / 3 2 / 3

2 / 3 3 / 3

11 / 21

Kocsis, Levente, and Csaba Szepesvári. Bandit based monte-carlo

planning. Machine Learning: ECML 2006. Springer Berlin

Heidelberg, 2006. 282-293.

Cumulated reward Nr. of episodes

30

Page 31: A Framework for Simulation-Based Online PlanningA Framework for Simulation-Based Online Planning Martin Wirsing In Kooperation mit Lenz Belzner und Rolf Hennicker, FACS 2015, LNCS

Strategy Inside the Tree

E.g. Upper Confidence Bounds for Trees

Treat action selection as multiarmed bandit

Select actions that maximize

𝑈𝐶𝑇𝑗 = 𝑋𝑗 + 2𝐶2 ln 𝑛

𝑛𝑗

𝑋𝑗: Average reward of child node j

𝑛: Nr. of episodes from current node

𝑛𝑗 : Nr. of episodes from child node j

𝐶: UCT exploration constant

4 / 8 0 / 3 7 / 10

2 / 4 5 / 6 1 / 2 1 / 3 2 / 3

2 / 3 3 / 3

11 / 21

Kocsis, Levente, and Csaba Szepesvári. Bandit based monte-carlo

planning. Machine Learning: ECML 2006. Springer Berlin

Heidelberg, 2006. 282-293.

Exploit

observations

Explore

solution space

31

Page 32: A Framework for Simulation-Based Online PlanningA Framework for Simulation-Based Online Planning Martin Wirsing In Kooperation mit Lenz Belzner und Rolf Hennicker, FACS 2015, LNCS

Expand the Tree

4 / 8 0 / 3 7 / 10

2 / 4 5 / 6 1 / 2 1 / 3 2 / 3

2 / 3 3 / 3

11 / 21

Kocsis, Levente, and Csaba Szepesvári. Bandit based monte-carlo

planning. Machine Learning: ECML 2006. Springer Berlin

Heidelberg, 2006. 282-293.

0 / 0

Add a new node

When an episode leaves the tree

32

Page 33: A Framework for Simulation-Based Online PlanningA Framework for Simulation-Based Online Planning Martin Wirsing In Kooperation mit Lenz Belzner und Rolf Hennicker, FACS 2015, LNCS

Strategy Outside the Tree

Initial 𝑷𝑨𝒄𝒕𝒊𝒐𝒏(𝑨|𝑺)

Simulate episode to depth ℎ𝑚𝑎𝑥

Observe result

E.g. reward observed

Here: 0 or 1

Reward: 1

4 / 8 0 / 3 7 / 10

2 / 4 5 / 6 1 / 2 1 / 3 2 / 3

2 / 3 3 / 3

11 / 21

0 / 0

33

Page 34: A Framework for Simulation-Based Online PlanningA Framework for Simulation-Based Online Planning Martin Wirsing In Kooperation mit Lenz Belzner und Rolf Hennicker, FACS 2015, LNCS

Update the statistics

This changes the strategy inside the tree

Update Strategy

4 / 8 0 / 3 8 / 11

2 / 4 6 / 7 1 / 2 1 / 3 2 / 3

2 / 3 4 / 4

12 / 22

1 / 1

34

Page 35: A Framework for Simulation-Based Online PlanningA Framework for Simulation-Based Online Planning Martin Wirsing In Kooperation mit Lenz Belzner und Rolf Hennicker, FACS 2015, LNCS

Trees Represent Strategies

MCTS builds a skewed tree

Tree can be interpreted as 𝑃𝐴𝑐𝑡𝑖𝑜𝑛(𝐴|𝑆)

Promising parts of the strategy space are prefered

35

Page 36: A Framework for Simulation-Based Online PlanningA Framework for Simulation-Based Online Planning Martin Wirsing In Kooperation mit Lenz Belzner und Rolf Hennicker, FACS 2015, LNCS

Example Domain

Search and Rescue Victims, fires and ambulances

Unknown topology

Unknown initial situation

Agent actions Noop, Move

Load or drop a victim

Extinguish fire if adjacent

Noise Actions may fail

Fires ignite and cease

Experiment Monte Carlo Tree Search

Large state space (> 1012)

Large branching factor (218)

0.2 seconds/decision

𝑃𝑆𝑖𝑚 𝑆 𝑆 𝑥 𝐴) models domain perfectly

36

Page 37: A Framework for Simulation-Based Online PlanningA Framework for Simulation-Based Online Planning Martin Wirsing In Kooperation mit Lenz Belzner und Rolf Hennicker, FACS 2015, LNCS

Measured (in %)

Victims at ambulance (blue)

Victims in a fire (red)

Positions on fire (green)

Provided reward

Victim at ambulance: +100

System synthesized sensible behavior

Results in 0.95 confidence interval

Checked with MultiVeStA

Experimental Results (I)

Autonomy

Stefano Sebastio and Andrea Vandin. MultiVeStA: statistical

model checking for discrete event simulators. ValueTools '13.

2013. 310-315.

37

Page 38: A Framework for Simulation-Based Online PlanningA Framework for Simulation-Based Online Planning Martin Wirsing In Kooperation mit Lenz Belzner und Rolf Hennicker, FACS 2015, LNCS

Measured (in %)

Victims at ambulance (blue)

Victims in a fire (red)

Positions on fire (green)

Expose system to unexpected events

At steps 20, 40, 60, 80

All carried victims are dropped

New fires break out

Events NOT simulated by planner

New situation incorporated by planner

System showed sensible reactions

Results in 0.95 confidence interval

Experimental Results (II)

Robustness

38

Page 39: A Framework for Simulation-Based Online PlanningA Framework for Simulation-Based Online Planning Martin Wirsing In Kooperation mit Lenz Belzner und Rolf Hennicker, FACS 2015, LNCS

Measured (in %)

Victims at ambulance (blue)

Victims in a fire (red)

Positions on fire (green)

Change system goals while operating

Change of reward function

Steps 0-40: Reward for victims not in a fire

Steps 40-80: Reward for victims at ambulance

Change NOT simulated by planner

But planner incorporates new situation

System adapted behavior wrt. goals

Results in 0.95 confidence interval

Experimental Results (III)

Flexibility

39

Page 40: A Framework for Simulation-Based Online PlanningA Framework for Simulation-Based Online Planning Martin Wirsing In Kooperation mit Lenz Belzner und Rolf Hennicker, FACS 2015, LNCS

From Discrete to Continuous Domains

Actions

State and action space = ℝn

E.g. (speed, rotation, duration) for actions

Cross Entropy Planning

Approximate (unknown) target distribution

Multivariate Gaussian distribution

Sample state space (locally) and choose „elite“ samples

for updating the strategy (‚sharpen‘ the Gaussian)

Here: Gaussians over sequences of actions

Sequence length = planning depth

Ari Weinstein and Michael L. Littman. Open-loop planning in large-scale stochastic domains.

Proceedings of the Twenty-Seventh AAAI Conference on Artificial Intelligence, 2013. 40

Page 41: A Framework for Simulation-Based Online PlanningA Framework for Simulation-Based Online Planning Martin Wirsing In Kooperation mit Lenz Belzner und Rolf Hennicker, FACS 2015, LNCS

Cross Entropy Planning

White circle represents agent

Red boxes represent moving victims

Black lines are simulation episodes

Action parameters are speed, rotation and duration

Images show iterations 1, 5 and 10

Simulation depth is adaptive here (reduced simulation cost)

Note the iterative “shaping” of a promising strategy

42

Page 42: A Framework for Simulation-Based Online PlanningA Framework for Simulation-Based Online Planning Martin Wirsing In Kooperation mit Lenz Belzner und Rolf Hennicker, FACS 2015, LNCS

Video: Cross Entropy Planning

43

Page 43: A Framework for Simulation-Based Online PlanningA Framework for Simulation-Based Online Planning Martin Wirsing In Kooperation mit Lenz Belzner und Rolf Hennicker, FACS 2015, LNCS

Video: Cross Entropy Planning

The video showed interleaving planning and execution

Illustrates iterative shaping of a probabilistic strategy

When parallelizing planning and execution, this looks a little

different…

44

Page 44: A Framework for Simulation-Based Online PlanningA Framework for Simulation-Based Online Planning Martin Wirsing In Kooperation mit Lenz Belzner und Rolf Hennicker, FACS 2015, LNCS

Video: Continuous CE Planning

45

Page 45: A Framework for Simulation-Based Online PlanningA Framework for Simulation-Based Online Planning Martin Wirsing In Kooperation mit Lenz Belzner und Rolf Hennicker, FACS 2015, LNCS

Cross Entropy Planning Experiments

Interleaving

Parallel

CE: Cross Entropy Planning

TACE: Time Adaptive CE

C3: Continuous CE Control

h: Planning depth

d: Action duration

Victims left

Tim

e [

s]

46

Page 46: A Framework for Simulation-Based Online PlanningA Framework for Simulation-Based Online Planning Martin Wirsing In Kooperation mit Lenz Belzner und Rolf Hennicker, FACS 2015, LNCS

Concluding Remarks Motivation

Complex dynamic domains

High degrees of non-determinism

Approach Model a space of solutions, instead of a single one

Online planning: Refine the solution space at runtime wrt. observations and knowledge to determine a currently viable action

This Talk Component framework for Online Planning

Parallelization of execution and planning

Instantiation: Simulation Based Planning Two examples: MCTS, Cross Entropy Planning

Outlook Model learning of domain dynamics

Soft temporal logic for formal (statistical) verification

Learning and planning for ensembles

47

Page 47: A Framework for Simulation-Based Online PlanningA Framework for Simulation-Based Online Planning Martin Wirsing In Kooperation mit Lenz Belzner und Rolf Hennicker, FACS 2015, LNCS

References 1. Cameron B Browne, Edward Powley, Daniel Whitehouse, Simon M Lucas, Peter I Cowling,

Philipp Rohlfshagen, Stephen Tavener, Diego Perez, Spyridon Samothrakis, and Simon Colton.

A survey of Monte Carlo tree search methods. IEEE Transactions on Computational Intelligence

and AI in Games, 4(1), 2012, 1-43.

2. Kocsis, Levente, and Csaba Szepesvári. Bandit based Monte-Carlo planning. In Machine

Learning: ECML'06. Lecture Notes in Computer Science 4212, 2006, 282-293.

3. Bubeck, Sébastien, and Rémi Munos. Open Loop Optimistic Planning. In: 23rd Conference on

Learning Theory, COLT 2010. Omnipress 2010, 477-489.

4. Ari Weinstein and Michael L. Littman. Open-loop planning in large-scale stochastic domains. In:

Proceedings of the Twenty-Seventh AAAI Conference on Artificial Intelligence, 2013.

5. Stefano Sebastio and Andrea Vandin. MultiVeStA: statistical model checking for discrete event

simulators. In Proceedings of the 7th International Conference on Performance Evaluation

Methodologies and Tools (ValueTools '13). 2013, 310-315.

6. Lenz Belzner, Rolf Hennicker, Martin Wirsing: OnPlan: A Framework for Simulation-Based Online

Planning. In Christiano Braga, Peter Csaba Ölveczky (eds.): Formal Aspects of Component

Software - 12th International Conference, FACS 2015, Revised Selected Papers. Lecture Notes

in Computer Science 9539, 2016, 1-30.

48