By: Messias, Spaan, Lima Presented by: Mike Plasker DMES – Ocean Engineering

GSMDPs for Multi-Robot Sequential Decision Making

By: Messias, Spaan, Lima

Presented by: Mike PlaskerDMES – Ocean Engineering

IntroductionRobotic Planning under uncertaintyMDP solutionsLimited real-world application

Assumptions for Multi-Robot teamsCommunication (Inexpensive, free, or costly)Synchronous and steady state transitionsDiscretization of environment

A Different ApproachStates and actions discrete (like MDP)Continuous measure of timeState transitions regarded as random ‘events’

AdvantagesNon-Markovian effects of discretization

minimizedFully reactive to changesCommunication only required for ‘events’

GSMDPsGeneric temporal probability distributions

over eventsCan model concurrent (persistently enabled)

eventsSolvable by discrete-time MDP algorithms by

obtaining an equivalent (semi-)Markovian model

Avoids negative effects of synchronous alternatives

Why GSMDPs for RoboticsCooperative Robotics requires:

Operation in inherently continuous environments

Uncertainty in actions (and observations)Joint decision making for optimizationReactive

Definitionsmultiagent GSMDP: tuple <d, S, X, A, T, F, R, C, h>

d = number agentsS = state space (contains state factors)X = state factorsA = set of joint actionsT = transition functionF = time modelR = instantaneous reward functionC = cumulative reward rateh = planning over continuous time

DefinitionsEvent in a GSMDP:An abstraction to state transitions that share the same properties

Persistently enabled events:Events that are enabled from step ‘t’ to step ‘t+1’, but not triggered at step ‘t’

Common ApproachSynchronous actionPre-defined time step

• Performance• Reaction time

GSMDPsPersistently enabled events modeled by

allowing their temporal distributions to depend on the time they were enabled

Explicit modeling of non-Markovian effects from discretization

Communication efficiency

Modeling EventsGroup state transitions as events to minimize

temporal distributions and transitions(battery low)

Transition function found by estimating relative frequency of each transition in the event

Time model found by timing the transition data

Approximated as a phase-type distributionReplaces events with acyclic Markov chains

Events (cont.)Not always possibleDecompose events with minimum duration

into deterministically timed transitionsCan then better approximate using phase-

type distribution

Solving a GSMDPCan be viewed as an equivalent discrete-time

MDPAlmost all solution algorithms for MDPs work

ExperimentRobotic soccerScore a goal (reward 150)Passing around obstacle (reward 60)

ResultsMDP: T = 4s

GSMDP

ResultsNo idle timeReduced

communicationImproved scoring

efficiencySystem failures

(zero goals) independent of model

Example Video

http://vimeo.com/57942910

Future WorkExtend to partially observable domainsApply bilateral phase distributions to

increase the class of non-Markovian events that are able to be modeled

Questions?

MESSIAS, J.; SPAAN, M.; LIMA, P.. GSMDPs for Multi-Robot Sequential Decision-Making. AAAI Conference on Artificial Intelligence, North America, jun. 2013. Available at: <http://www.aaai.org/ocs/index.php/AAAI/AAAI13/paper/view/6432/6843>. Date accessed: 06 Apr. 2014

http://www.aaai.org/ocs/index.php/AAAI/AAAI13/paper/view/6432/6843

Documents

By: Messias, Spaan, Lima Presented by: Mike Plasker DMES – Ocean Engineering