21
GSMDPs for Multi-Robot Sequential Decision Making By: Messias, Spaan, Lima Presented by: Mike Plasker DMES – Ocean Engineering

By: Messias, Spaan, Lima Presented by: Mike Plasker DMES – Ocean Engineering

Embed Size (px)

Citation preview

Page 1: By: Messias, Spaan, Lima Presented by: Mike Plasker DMES – Ocean Engineering

GSMDPs for Multi-Robot Sequential Decision Making

By: Messias, Spaan, Lima

Presented by: Mike PlaskerDMES – Ocean Engineering

Page 2: By: Messias, Spaan, Lima Presented by: Mike Plasker DMES – Ocean Engineering

IntroductionRobotic Planning under uncertaintyMDP solutionsLimited real-world application

Page 3: By: Messias, Spaan, Lima Presented by: Mike Plasker DMES – Ocean Engineering

Assumptions for Multi-Robot teamsCommunication (Inexpensive, free, or costly)Synchronous and steady state transitionsDiscretization of environment

Page 4: By: Messias, Spaan, Lima Presented by: Mike Plasker DMES – Ocean Engineering

A Different ApproachStates and actions discrete (like MDP)Continuous measure of timeState transitions regarded as random ‘events’

Page 5: By: Messias, Spaan, Lima Presented by: Mike Plasker DMES – Ocean Engineering

AdvantagesNon-Markovian effects of discretization

minimizedFully reactive to changesCommunication only required for ‘events’

Page 6: By: Messias, Spaan, Lima Presented by: Mike Plasker DMES – Ocean Engineering

GSMDPsGeneric temporal probability distributions

over eventsCan model concurrent (persistently enabled)

eventsSolvable by discrete-time MDP algorithms by

obtaining an equivalent (semi-)Markovian model

Avoids negative effects of synchronous alternatives

Page 7: By: Messias, Spaan, Lima Presented by: Mike Plasker DMES – Ocean Engineering

Why GSMDPs for RoboticsCooperative Robotics requires:

Operation in inherently continuous environments

Uncertainty in actions (and observations)Joint decision making for optimizationReactive

Page 8: By: Messias, Spaan, Lima Presented by: Mike Plasker DMES – Ocean Engineering

Definitionsmultiagent GSMDP: tuple <d, S, X, A, T, F, R, C, h>

d = number agentsS = state space (contains state factors)X = state factorsA = set of joint actionsT = transition functionF = time modelR = instantaneous reward functionC = cumulative reward rateh = planning over continuous time

Page 9: By: Messias, Spaan, Lima Presented by: Mike Plasker DMES – Ocean Engineering

DefinitionsEvent in a GSMDP:An abstraction to state transitions that share the same properties

Persistently enabled events:Events that are enabled from step ‘t’ to step ‘t+1’, but not triggered at step ‘t’

Page 10: By: Messias, Spaan, Lima Presented by: Mike Plasker DMES – Ocean Engineering

Common ApproachSynchronous actionPre-defined time step

• Performance• Reaction time

Page 11: By: Messias, Spaan, Lima Presented by: Mike Plasker DMES – Ocean Engineering

GSMDPsPersistently enabled events modeled by

allowing their temporal distributions to depend on the time they were enabled

Explicit modeling of non-Markovian effects from discretization

Communication efficiency

Page 12: By: Messias, Spaan, Lima Presented by: Mike Plasker DMES – Ocean Engineering

Modeling EventsGroup state transitions as events to minimize

temporal distributions and transitions(battery low)

Transition function found by estimating relative frequency of each transition in the event

Time model found by timing the transition data

Approximated as a phase-type distributionReplaces events with acyclic Markov chains

Page 13: By: Messias, Spaan, Lima Presented by: Mike Plasker DMES – Ocean Engineering

Events (cont.)Not always possibleDecompose events with minimum duration

into deterministically timed transitionsCan then better approximate using phase-

type distribution

Page 14: By: Messias, Spaan, Lima Presented by: Mike Plasker DMES – Ocean Engineering

Solving a GSMDPCan be viewed as an equivalent discrete-time

MDPAlmost all solution algorithms for MDPs work

Page 15: By: Messias, Spaan, Lima Presented by: Mike Plasker DMES – Ocean Engineering

ExperimentRobotic soccerScore a goal (reward 150)Passing around obstacle (reward 60)

Page 16: By: Messias, Spaan, Lima Presented by: Mike Plasker DMES – Ocean Engineering

ResultsMDP: T = 4s

GSMDP

Page 17: By: Messias, Spaan, Lima Presented by: Mike Plasker DMES – Ocean Engineering

ResultsNo idle timeReduced

communicationImproved scoring

efficiencySystem failures

(zero goals) independent of model

Page 18: By: Messias, Spaan, Lima Presented by: Mike Plasker DMES – Ocean Engineering

Example Video

Page 19: By: Messias, Spaan, Lima Presented by: Mike Plasker DMES – Ocean Engineering

Future WorkExtend to partially observable domainsApply bilateral phase distributions to

increase the class of non-Markovian events that are able to be modeled

Page 20: By: Messias, Spaan, Lima Presented by: Mike Plasker DMES – Ocean Engineering

Questions?

Page 21: By: Messias, Spaan, Lima Presented by: Mike Plasker DMES – Ocean Engineering

MESSIAS, J.; SPAAN, M.; LIMA, P.. GSMDPs for Multi-Robot Sequential Decision-Making. AAAI Conference on Artificial Intelligence, North America, jun. 2013. Available at: <http://www.aaai.org/ocs/index.php/AAAI/AAAI13/paper/view/6432/6843>. Date accessed: 06 Apr. 2014