Using Communication to Reduce Locality in Multi-Robo

Using Communication to Reduce Locality in Multi-Robot Learning

By: Maja J. Mataric

Presentation By: Derak Berreyesa

UNR, CS, 11/17/04

Attempt to bridge the fields of machine learning, robotics and distributed AI.

Deals with two key problems:Hidden stateCredit state

Hidden State

Situated agents typically can’t sense all information for completing the task and learning to perform it efficiently.

Credit Assignment

Arises because of reinforcement in a distributed system is often provided at a global level, and must somehow be divided over multiple agents whose impact differs and varies over time.

Solving the problems

Apply communication as sensing and as reinforcement, in each case through local undirected broadcast.

Demonstrated the idea on two multi-robot learning experiments.

Two robots

A tightly-coupled coordination task (box pushing.)

Communication for sharing sensory data to overcome hidden state.

Reinforcement data to overcome the credit assignment.

Four Robots

Loosely-coupled task, learning social rules, (yielding and sharing information.)

Uses Communication to bridge the gap between global and local payoff.

In both cases

The main goal is to increase the scope of impact of a single agent.

Clusters agents when they are tightly interacting.

Has the effect of making the system less distributed and alleviates the hidden state and credit assignment problems.

Communication as Sensing

Sensors are in-accurate and un-reliable. Interaction between agents is very important. Communication can be used as a form of

sensing. Things that are hard to sense can be

communicated. Agents that broadcast their state learn better. There is still inaccuracy with sending the

messages.

Communication as Reinforcement

It is hard for multi-agent systems to achieve group-level coherence.

Central controller maintains optimizations over state space and sends commands to the group.

Information is usually not available and can’t be completed in real time

Communication poses a bottleneck.

Reinforcement (cont.)

As multi-agent systems learn their behavior changes resulting in inconsistencies.

Credit assignment problem the level of the individual because interaction with the other agents delays the agents payoff.

At the group level because local individual behavior must be associated with global outcomes.

Reinforcement (cont.) again.

Communication as reinforcement enables agents to locally share a reward in order to overcome the credit assignment problem.

Communication for Shared Sensing

2 robots pushing a box. Box is to heavy for one robot to push alone. Six-legged robots. Radio communication mechanisms. Whiskers that detect contact with the box. 5 sensors that detect direction and distance

form the goal. Goal marked with a bright light.

Shared Sensing (cont.)

Reinforcement learning framework Learning mapping between it’s sensors and

pre-programmed behaviors:– Find-box– Push-forward– Push-left– Push-right– Stop– Send-msg

Shared Sensing (cont.) again.

Algorithm chose “best” action 75% of the time and random action 25% of the time.

Hidden state problem was solved by having the two agents pool their sensory resources.

Credit assignment is solved by each agent telling the other what action to perform, observe the outcome and share the reward or punishment.

Shared Sensing (cont.) once more.

Desired policy was learned by both robots in over 85% of the trials.

Its was learned on average in 7.3 minutes. There were about 40 trials. Each robot learned differently depending on

which side it was on.

Communication for Shared Reward

4 robots. 2 social rules:

– Yielding to each other– Sharing information about the location of pucks.

Infa-red and bump sensors. Radio communication Sensors don’t give information about robots

external state or behavior.

Shared Reward (cont.)

Basis behaviors:– Pickup– Drop– Home– Wander– Follow– Send-msg

Shared Reward (cont.) again.

Goal was to have robot improve it’s individual collection of pucks while still yielding to another and send messages to know when follow or proceed to the location.

“best” behavior would be 60% of time and random 10%, it would follow 30% of the time.

It is difficult for robots to learn social rules and the credit assignment problem was a big problem.

Shared Reward (cont.) once more.

Locally sharing information was sufficient to enable the group to learn social behaviors.

The social policies of yielding and sharing were learned by all robots in over 90% of the trials.

They were learned on an average of 20 to 25 minutes. Unlike the first experiment sensory information was

individual and reinforcement was shared. This took care of the credit assignment problem. Radio broadcast communication proved to be robust. Lost data was mostly ignored, but did slow down

learning a little.

Conclusions

Dealt with two problems of learning in multi agent environments.

Simple communication over local broadcast can be used to address both problems.

We had fun doing it!!!!

Technology

Using Communication to Reduce Locality in Multi-Robo