Upload
derakberreyesa
View
327
Download
3
Tags:
Embed Size (px)
Citation preview
Using Communication to Reduce Locality in Multi-Robot Learning
By: Maja J. Mataric
Presentation By: Derak Berreyesa
UNR, CS, 11/17/04
Attempt to bridge the fields of machine learning, robotics and distributed AI.
Deals with two key problems:Hidden stateCredit state
Hidden State
Situated agents typically can’t sense all information for completing the task and learning to perform it efficiently.
Credit Assignment
Arises because of reinforcement in a distributed system is often provided at a global level, and must somehow be divided over multiple agents whose impact differs and varies over time.
Solving the problems
Apply communication as sensing and as reinforcement, in each case through local undirected broadcast.
Demonstrated the idea on two multi-robot learning experiments.
Two robots
A tightly-coupled coordination task (box pushing.)
Communication for sharing sensory data to overcome hidden state.
Reinforcement data to overcome the credit assignment.
Four Robots
Loosely-coupled task, learning social rules, (yielding and sharing information.)
Uses Communication to bridge the gap between global and local payoff.
In both cases
The main goal is to increase the scope of impact of a single agent.
Clusters agents when they are tightly interacting.
Has the effect of making the system less distributed and alleviates the hidden state and credit assignment problems.
Communication as Sensing
Sensors are in-accurate and un-reliable. Interaction between agents is very important. Communication can be used as a form of
sensing. Things that are hard to sense can be
communicated. Agents that broadcast their state learn better. There is still inaccuracy with sending the
messages.
Communication as Reinforcement
It is hard for multi-agent systems to achieve group-level coherence.
Central controller maintains optimizations over state space and sends commands to the group.
Information is usually not available and can’t be completed in real time
Communication poses a bottleneck.
Reinforcement (cont.)
As multi-agent systems learn their behavior changes resulting in inconsistencies.
Credit assignment problem the level of the individual because interaction with the other agents delays the agents payoff.
At the group level because local individual behavior must be associated with global outcomes.
Reinforcement (cont.) again.
Communication as reinforcement enables agents to locally share a reward in order to overcome the credit assignment problem.
Communication for Shared Sensing
2 robots pushing a box. Box is to heavy for one robot to push alone. Six-legged robots. Radio communication mechanisms. Whiskers that detect contact with the box. 5 sensors that detect direction and distance
form the goal. Goal marked with a bright light.
Shared Sensing (cont.)
Reinforcement learning framework Learning mapping between it’s sensors and
pre-programmed behaviors:– Find-box– Push-forward– Push-left– Push-right– Stop– Send-msg
Shared Sensing (cont.) again.
Algorithm chose “best” action 75% of the time and random action 25% of the time.
Hidden state problem was solved by having the two agents pool their sensory resources.
Credit assignment is solved by each agent telling the other what action to perform, observe the outcome and share the reward or punishment.
Shared Sensing (cont.) once more.
Desired policy was learned by both robots in over 85% of the trials.
Its was learned on average in 7.3 minutes. There were about 40 trials. Each robot learned differently depending on
which side it was on.
Communication for Shared Reward
4 robots. 2 social rules:
– Yielding to each other– Sharing information about the location of pucks.
Infa-red and bump sensors. Radio communication Sensors don’t give information about robots
external state or behavior.
Shared Reward (cont.)
Basis behaviors:– Pickup– Drop– Home– Wander– Follow– Send-msg
Shared Reward (cont.) again.
Goal was to have robot improve it’s individual collection of pucks while still yielding to another and send messages to know when follow or proceed to the location.
“best” behavior would be 60% of time and random 10%, it would follow 30% of the time.
It is difficult for robots to learn social rules and the credit assignment problem was a big problem.
Shared Reward (cont.) once more.
Locally sharing information was sufficient to enable the group to learn social behaviors.
The social policies of yielding and sharing were learned by all robots in over 90% of the trials.
They were learned on an average of 20 to 25 minutes. Unlike the first experiment sensory information was
individual and reinforcement was shared. This took care of the credit assignment problem. Radio broadcast communication proved to be robust. Lost data was mostly ignored, but did slow down
learning a little.
Conclusions
Dealt with two problems of learning in multi agent environments.
Simple communication over local broadcast can be used to address both problems.
We had fun doing it!!!!