Upload
nailah
View
28
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Learning Momentum: Integration and Experimentation. Brian Lee and Ronald C. Arkin Mobile Robot Laboratory Georgia Tech Atlanta, GA. Motivation. It’s hard to manually derive controller parameters. The parameter space increases exponentially with the number of parameters. - PowerPoint PPT Presentation
Citation preview
Learning Momentum: Integration and
ExperimentationBrian Lee and Ronald C. ArkinMobile Robot LaboratoryGeorgia TechAtlanta, GA
Motivation It’s hard to manually derive controller parameters.
The parameter space increases exponentially with the number of parameters.
You don’t always have a priori knowledge of the environment. Without prior knowledge, a user can’t confidently derive
appropriate parameter values, so it becomes necessary for the robot to adapt on its own to what it finds.
Obstacle densities and layout in the environment may be heterogeneous. Parameters that work well for one type of environment may
not work well with another type.
Adaptation and Learning Methods – DARPA MARS Investigate robot shaping at five distinct
levels in a hybrid robot software architecture Implement algorithms within MissionLab
mission specification system Conduct experiments to evaluate
performance of each technique Combine techniques where possible Integrate on a platform more suitable for
realistic missions and continue development
Overview of techniques CBR Wizardry
Guide the operator Probabilistic Planning
Manage complexity for the operator
RL for Behavioral Assemblage Selection
Learn what works for the robot
CBR for Behavior Transitions
Adapt to situations the robot can recognize
Learning Momentum Vary robot parameters in
real time
THE LEARNINGCONTINUUM:
Deliberative (premission)
.
.
.
Behavioral switching
.
.
.
Reactive (online
adaptation)
.
.
.
Basic Concepts of LM Provides adaptability to behavior-based systems A crude form of reinforcement learning.
If the robot is doing well, keep doing what it’s doing, otherwise try something different.
Behavior parameters are changed in response to progress and obstacles.
The system is still fully reactive. Although the robot changes its behavior, there is no
deliberation.
Currently Used Behaviors Move to Goal
Always returns a vector pointing toward the goal position.
Avoid Obstacles Returns a sum of weighted vectors pointing
away from obstacles. Wander
Returns vectors pointing in random directions.
Adjustable Parameters Move to goal vector gain Avoid obstacle vector gain Avoid obstacle sphere of influence
Radius around the robot inside of which obstacles are perceived
Wander vector gain Wander persistence
The number of consecutive steps the wander vector points in the same direction
Four Predefined Situations no movement
M < T movement
progress toward the goal M > T movement P > T progress
no progress with obstacles M > T movement P < T progress O count > T obstacles
no progress without obstacles M > T movement P < T progress O count < T obstacles
M = average movement M goal = average movement to the goal P = M goal / M O count = obstacles encountered T movement = movement threshold T progress = progress threshold T obstacles = obstacles threshold
Parameter adjustmentsGoal Obstacle NoiseGain Gain Sphere Gain Persistence
NoMovement
-0.1 to 0.0 -0.1 to 0.0 -0.5 to 0.0 0.1 to 0.5 0 to 1
Progress 0.5 to 1.0 -0.1 to 0.0 -0.5 to 0.0 -0.1 to 0.0 -1 to 0
No Progressw/ Obstacles
-0.1 to 0.0 0.1 to 0.5 0.0 to 0.5 0.0 to 0.1 0 to 1
No Progressw/out Obst.
0.0 to 0.3 -0.1 to 0.0 0.0 to 0.5 -0.2 to 0.0 -1 to 0
Sample adjustment parameters for ballooning.
Two Possible Strategies Ballooning - Sphere of influence is
increased when obstacles impede progress. The robot moves around large objects.
Squeezing - Sphere of influence is decreased when obstacles impede progress. The robot moves between closely spaced objects.
IntegrationBase System
Position andGoal Information
ObstacleInformation
Move To Goal(Gm)
Avoid Obstacles(Go,S)
Wander(Gw,P)
∑Output
direction
Sensors Controller
Gm = goal gain Go = obstacle gain S = obstacle sphere of influence Gw = wander gain P = wander persistence
IntegrationIntegrated System
Position andGoal Information
ObstacleInformation
Move To Goal(Gm)
Avoid Obstacles(Go,S)
Wander(Gw,P)
∑Output
direction
Sensors Controller
LM ModuleNew Gm, Go, S, Gw, and P parameters.
Gm = goal gain Go = obstacle gain S = obstacle sphere of influence Gw = wander gain P = wander persistence
Experiments in Simulation 150m x 150m area robot moves from (10m, 10m) to
(140m, 90m) Obstacle densities of 15% and 20%
were used. Obstacle radii varied between 0.38m
and 1.43m.
Ballooning
Observations on Ballooning Covers a lot of area Not as easily trapped in box canyon
situations May settle in locally clear areas May require a high wander gain to
carry the robot through closely spaced obstacles
Squeezing
Observations on Squeezing Results in a straighter path Moves easily through closely spaced
obstacles May get trapped in small box canyon
situations for large amounts of time
Simulations of the Real World
StartPlace
EndPlace
24m x 10m
Simulated setup of the real world environment.
Completion Rates For Simulation
0
20
40
60
80
100
120
SetA
SetB
SetC
SetD
0102030405060708090
100
SetE
SetF
SetG
SetH
LM Strategy WanderGain
WanderUpper Limit
Bar 1 None 0.3 NABar 2 None 0.5 NABar 3 None 1.0 NABar 4 Ballooning NA 15Bar 5 Ballooning NA 10Bar 6 Squeezing NA 15
LM Strategy WanderGain
WanderDelta Range
Bar 1 None 0.5 NABar 2 None 1.0 NABar 3 Ballooning NA 0.0 – 0.1Bar 4 Ballooning NA 0.0 – 0.5Bar 5 Squeezing NA 0.0 – 0.1Bar 6 Squeezing NA 0.0 – 0.5
Uniform Obstacle Size (1m radii)
Varying Obstacle Sizes (0.38m - 1.43m radii)
Average Steps to Completion
SetA Set
B SetC Set
D
0
2000
4000
6000
8000
10000
12000
14000
no LM
ballooning
squeezing
Set ESet F
Set GSet H
0
5000
10000
15000
20000
25000
30000
no LM
ballooning
squeezing
LM Strategy WanderGain
WanderUpper Limit
Bar 1 None 0.3 NABar 2 None 0.5 NABar 3 None 1.0 NABar 4 Ballooning NA 15Bar 5 Ballooning NA 10Bar 6 Squeezing NA 15
LM Strategy WanderGain
WanderDelta Range
Bar 1 None 0.5 NABar 2 None 1.0 NABar 3 Ballooning NA 0.0 – 0.1Bar 4 Ballooning NA 0.0 – 0.5Bar 5 Squeezing NA 0.0 – 0.1Bar 6 Squeezing NA 0.0 – 0.5
Uniform Obstacle Size (1m radii)
Varying Obstacle Sizes (0.38m - 1.43m radii)
Results FromSimulated Real Environment
0
100
200
300
400
500
600
700
800
No LM No LM ballooning squeezing0
102030405060708090
100
No LM No LM ballooning squeezing
% Complete Steps to Completion
As before, there is an increase in completion rates with an accompanying increase in steps to completion.
Simulation Results Completion rates can be drastically
improved. Completion rate improvements
come at a cost of time. Ballooning and squeezing strategies
are geared toward different situations.
Physical Robot Experiments Nomad 150 robot Sonar ring for
obstacle avoidance
Traverses the length of a 24m x 10m room while negotiating obstacles
Outdoor Run (adaptive)
Outdoor Run (non-adaptive)
Physical Experiment Results Non-learning robots
became stuck. Learning robots
successfully negotiated the obstacles.
Squeezing was faster than ballooning in this case.
0
200
400
600
800
1000
1200
1400
No LM Ballooning
Average steps to goal.
Conclusions Improved success has a price of time. Performance of one strategy is very poor
in situations better suited for another strategy.
The ballooning strategy is generally faster.
Ballooning robots can move through closely spaced objects faster than squeezing robots can move out of box canyon situations.
Conclusions (cont’d) If some general knowledge of the terrain
is know a priori, an appropriate strategy can be chosen.
If terrain is totally unknown, ballooning is probably the better choice.
A way to dynamically switch strategies should improve performance.