Learning Momentum: Integration and Experimentation

Learning Momentum: Integration and

ExperimentationBrian Lee and Ronald C. ArkinMobile Robot LaboratoryGeorgia TechAtlanta, GA

Motivation It’s hard to manually derive controller parameters.

The parameter space increases exponentially with the number of parameters.

You don’t always have a priori knowledge of the environment. Without prior knowledge, a user can’t confidently derive

appropriate parameter values, so it becomes necessary for the robot to adapt on its own to what it finds.

Obstacle densities and layout in the environment may be heterogeneous. Parameters that work well for one type of environment may

not work well with another type.

Adaptation and Learning Methods – DARPA MARS Investigate robot shaping at five distinct

levels in a hybrid robot software architecture Implement algorithms within MissionLab

mission specification system Conduct experiments to evaluate

performance of each technique Combine techniques where possible Integrate on a platform more suitable for

realistic missions and continue development

Overview of techniques CBR Wizardry

Guide the operator Probabilistic Planning

Manage complexity for the operator

RL for Behavioral Assemblage Selection

Learn what works for the robot

CBR for Behavior Transitions

Adapt to situations the robot can recognize

Learning Momentum Vary robot parameters in

real time

THE LEARNINGCONTINUUM:

Deliberative (premission)

.

.

.

Behavioral switching

.

.

.

Reactive (online

adaptation)

.

.

.

Basic Concepts of LM Provides adaptability to behavior-based systems A crude form of reinforcement learning.

If the robot is doing well, keep doing what it’s doing, otherwise try something different.

Behavior parameters are changed in response to progress and obstacles.

The system is still fully reactive. Although the robot changes its behavior, there is no

deliberation.

Currently Used Behaviors Move to Goal

Always returns a vector pointing toward the goal position.

Avoid Obstacles Returns a sum of weighted vectors pointing

away from obstacles. Wander

Returns vectors pointing in random directions.

Adjustable Parameters Move to goal vector gain Avoid obstacle vector gain Avoid obstacle sphere of influence

Radius around the robot inside of which obstacles are perceived

Wander vector gain Wander persistence

The number of consecutive steps the wander vector points in the same direction

Four Predefined Situations no movement

M < T movement

progress toward the goal M > T movement P > T progress

no progress with obstacles M > T movement P < T progress O count > T obstacles

no progress without obstacles M > T movement P < T progress O count < T obstacles

M = average movement M goal = average movement to the goal P = M goal / M O count = obstacles encountered T movement = movement threshold T progress = progress threshold T obstacles = obstacles threshold

Parameter adjustmentsGoal Obstacle NoiseGain Gain Sphere Gain Persistence

NoMovement

-0.1 to 0.0 -0.1 to 0.0 -0.5 to 0.0 0.1 to 0.5 0 to 1

Progress 0.5 to 1.0 -0.1 to 0.0 -0.5 to 0.0 -0.1 to 0.0 -1 to 0

No Progressw/ Obstacles

-0.1 to 0.0 0.1 to 0.5 0.0 to 0.5 0.0 to 0.1 0 to 1

No Progressw/out Obst.

0.0 to 0.3 -0.1 to 0.0 0.0 to 0.5 -0.2 to 0.0 -1 to 0

Sample adjustment parameters for ballooning.

Two Possible Strategies Ballooning - Sphere of influence is

increased when obstacles impede progress. The robot moves around large objects.

Squeezing - Sphere of influence is decreased when obstacles impede progress. The robot moves between closely spaced objects.

IntegrationBase System

Position andGoal Information

ObstacleInformation

Move To Goal(Gm)

Avoid Obstacles(Go,S)

Wander(Gw,P)

∑Output

direction

Sensors Controller

Gm = goal gain Go = obstacle gain S = obstacle sphere of influence Gw = wander gain P = wander persistence

IntegrationIntegrated System

Position andGoal Information

ObstacleInformation

Move To Goal(Gm)

Avoid Obstacles(Go,S)

Wander(Gw,P)

∑Output

direction

Sensors Controller

LM ModuleNew Gm, Go, S, Gw, and P parameters.

Gm = goal gain Go = obstacle gain S = obstacle sphere of influence Gw = wander gain P = wander persistence

Experiments in Simulation 150m x 150m area robot moves from (10m, 10m) to

(140m, 90m) Obstacle densities of 15% and 20%

were used. Obstacle radii varied between 0.38m

and 1.43m.

Ballooning

Observations on Ballooning Covers a lot of area Not as easily trapped in box canyon

situations May settle in locally clear areas May require a high wander gain to

carry the robot through closely spaced obstacles

Squeezing

Observations on Squeezing Results in a straighter path Moves easily through closely spaced

obstacles May get trapped in small box canyon

situations for large amounts of time

Simulations of the Real World

StartPlace

EndPlace

24m x 10m

Simulated setup of the real world environment.

Completion Rates For Simulation

0

20

40

60

80

100

120

SetA

SetB

SetC

SetD

0102030405060708090

100

SetE

SetF

SetG

SetH

LM Strategy WanderGain

WanderUpper Limit

Bar 1 None 0.3 NABar 2 None 0.5 NABar 3 None 1.0 NABar 4 Ballooning NA 15Bar 5 Ballooning NA 10Bar 6 Squeezing NA 15


WanderDelta Range

Bar 1 None 0.5 NABar 2 None 1.0 NABar 3 Ballooning NA 0.0 – 0.1Bar 4 Ballooning NA 0.0 – 0.5Bar 5 Squeezing NA 0.0 – 0.1Bar 6 Squeezing NA 0.0 – 0.5

Uniform Obstacle Size (1m radii)

Varying Obstacle Sizes (0.38m - 1.43m radii)

Average Steps to Completion

SetA Set

B SetC Set

D

0

2000

4000

6000

8000

10000

12000

14000

no LM

ballooning

squeezing

Set ESet F

Set GSet H

0

5000

10000

15000

20000

25000

30000

no LM

ballooning

squeezing


WanderUpper Limit

Bar 1 None 0.3 NABar 2 None 0.5 NABar 3 None 1.0 NABar 4 Ballooning NA 15Bar 5 Ballooning NA 10Bar 6 Squeezing NA 15


WanderDelta Range

Bar 1 None 0.5 NABar 2 None 1.0 NABar 3 Ballooning NA 0.0 – 0.1Bar 4 Ballooning NA 0.0 – 0.5Bar 5 Squeezing NA 0.0 – 0.1Bar 6 Squeezing NA 0.0 – 0.5

Uniform Obstacle Size (1m radii)

Varying Obstacle Sizes (0.38m - 1.43m radii)

Results FromSimulated Real Environment

0

100

200

300

400

500

600

700

800

No LM No LM ballooning squeezing0

102030405060708090

100

No LM No LM ballooning squeezing

% Complete Steps to Completion

As before, there is an increase in completion rates with an accompanying increase in steps to completion.

Simulation Results Completion rates can be drastically

improved. Completion rate improvements

come at a cost of time. Ballooning and squeezing strategies

are geared toward different situations.

Physical Robot Experiments Nomad 150 robot Sonar ring for

obstacle avoidance

Traverses the length of a 24m x 10m room while negotiating obstacles

Outdoor Run (adaptive)

Outdoor Run (non-adaptive)

Physical Experiment Results Non-learning robots

became stuck. Learning robots

successfully negotiated the obstacles.

Squeezing was faster than ballooning in this case.

0

200

400

600

800

1000

1200

1400

No LM Ballooning

Average steps to goal.

Conclusions Improved success has a price of time. Performance of one strategy is very poor

in situations better suited for another strategy.

The ballooning strategy is generally faster.

Ballooning robots can move through closely spaced objects faster than squeezing robots can move out of box canyon situations.

Conclusions (cont’d) If some general knowledge of the terrain

is know a priori, an appropriate strategy can be chosen.

If terrain is totally unknown, ballooning is probably the better choice.

A way to dynamically switch strategies should improve performance.

Documents

Learning Momentum: Integration and Experimentation