24
DARPA Mobile Autonomous Robot Software Leslie Pack Kaelbling; March 2001 1 Adaptive Intelligent Mobile Robotics Leslie Pack Kaelbling Artificial Intelligence Laboratory MIT

DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March 2001 1 Adaptive Intelligent Mobile Robotics Leslie Pack Kaelbling Artificial Intelligence

Embed Size (px)

Citation preview

Page 1: DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March 2001 1 Adaptive Intelligent Mobile Robotics Leslie Pack Kaelbling Artificial Intelligence

DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March 2001 1

Adaptive Intelligent Mobile Robotics

Leslie Pack Kaelbling

Artificial Intelligence Laboratory

MIT

Page 2: DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March 2001 1 Adaptive Intelligent Mobile Robotics Leslie Pack Kaelbling Artificial Intelligence

DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March 2001 2

Pyramid

•Addressing problem at multiple levels

Planning

Built-in Behaviors

Learning

Page 3: DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March 2001 1 Adaptive Intelligent Mobile Robotics Leslie Pack Kaelbling Artificial Intelligence

DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March 2001 3

Built-in Behaviors

Goal: general-purpose, robust visually guided local navigation

• optical flow for depth information• finding the floor

• optical flow information• Horswill’s ground-plane method

• build local occupancy grids• navigate given the grid

• reactive methods• dynamic programming

Page 4: DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March 2001 1 Adaptive Intelligent Mobile Robotics Leslie Pack Kaelbling Artificial Intelligence

DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March 2001 4

Reactive Obstacle Avoidance

Standard method in mobile robotics is to use potential fields

• attractive force toward goal• repulsive forces away from obstacles• robot moves in direction given by resultant force

New method for non-holonomic robots: move the center of the robot so that the front point is holonomic

Page 5: DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March 2001 1 Adaptive Intelligent Mobile Robotics Leslie Pack Kaelbling Artificial Intelligence

DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March 2001 5

Human Obstacle Avoidance

Control law based on visual angle and distance to goal and obstacles

Parameters set based on experiments with humans in large free-walking VR environment

Page 6: DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March 2001 1 Adaptive Intelligent Mobile Robotics Leslie Pack Kaelbling Artificial Intelligence

DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March 2001 6

Humans are Smooth!

Page 7: DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March 2001 1 Adaptive Intelligent Mobile Robotics Leslie Pack Kaelbling Artificial Intelligence

DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March 2001 7

Behavior Learning

Typical RL methods require far too much data to be practical in an online setting. Address the problem with

• strong generalization techniques• locally weighted regression• “skeptical” Q-Learning

• bootstrapping from human-supplied policy• need not be optimal and might be very wrong• shows learner “interesting” parts of the space• “bad” initial policies might be more effective

Page 8: DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March 2001 1 Adaptive Intelligent Mobile Robotics Leslie Pack Kaelbling Artificial Intelligence

DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March 2001 8

Two Learning Phases

LearningSystem

SuppliedControlPolicy

Environment

Phase One

AR O

Page 9: DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March 2001 1 Adaptive Intelligent Mobile Robotics Leslie Pack Kaelbling Artificial Intelligence

DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March 2001 9

Two Learning Phases

LearningSystem

SuppliedControlPolicy

Environment

AR O

Phase Two

Page 10: DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March 2001 1 Adaptive Intelligent Mobile Robotics Leslie Pack Kaelbling Artificial Intelligence

DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March 2001 10

New Results

Drive to goal, avoiding obstacles in visual field

Inputs (6 dimensions):• heading and distance to goal• image coordinates of two obstacles

Output:• steering angle

Reward:• +10 for getting to goal; -5 for running over obstacle

Training: simple policy that avoids one obstacle

Page 11: DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March 2001 1 Adaptive Intelligent Mobile Robotics Leslie Pack Kaelbling Artificial Intelligence

DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March 2001 11

Robot’s View

Page 12: DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March 2001 1 Adaptive Intelligent Mobile Robotics Leslie Pack Kaelbling Artificial Intelligence

DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March 2001 12

Local Navigation

Average steps to goal

0

50

100

150

200

250

-25 -15 -5 5 15

Training runs

Steps to goal

JAQLOptimalTrainer

Phase 1 Phase 2

Page 13: DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March 2001 1 Adaptive Intelligent Mobile Robotics Leslie Pack Kaelbling Artificial Intelligence

DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March 2001 13

Map Learning

Robot learns high-level structure of environment• topological maps appropriate for large-scale

structure• low-level behaviors induce topology• based on previous work using sonar• vision changes problem dramatically

• no more problems with many states looking the same

• now same state always looks different!

Page 14: DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March 2001 1 Adaptive Intelligent Mobile Robotics Leslie Pack Kaelbling Artificial Intelligence

DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March 2001 14

Sonar-Based Map Learning

DataTrue Model

Page 15: DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March 2001 1 Adaptive Intelligent Mobile Robotics Leslie Pack Kaelbling Artificial Intelligence

DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March 2001 15

Current Issues in Map Learning

• segmenting space into “rooms”• detecting doors and corridor openings• representation of places

• stored images• gross 3D structure• features for image and structure matching

Page 16: DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March 2001 1 Adaptive Intelligent Mobile Robotics Leslie Pack Kaelbling Artificial Intelligence

DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March 2001 16

Large Simulation Domain

Use for learning and large-scale experimentation that is impractical on a real robot

• built using video-game engine• large multi-story building• packages to deliver• battery power management• other agents (to survey)• dynamically appearing items to collect• general Bayes-net specification so it can be used

widely as a test bed

Page 17: DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March 2001 1 Adaptive Intelligent Mobile Robotics Leslie Pack Kaelbling Artificial Intelligence

DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March 2001 17

Hierarchical MDP Planning

Large simulated domain has unspeakably many primitive statesUse hierarchical representation for planning

• logarithmic improvement in planning times• some loss of optimality of plans

Existing work on planning and learning given a hierarchy• temporal abstraction: macro actions• spatial abstraction: aggregated states

Where does the hierarchy come from?• combined spatial and temporal abstraction• top-down splitting approach

Page 18: DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March 2001 1 Adaptive Intelligent Mobile Robotics Leslie Pack Kaelbling Artificial Intelligence

DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March 2001 18

Region-Based Hierarchies

Divide state space into regions• each region is a single abstract state at next level• polices for moving through regions are abstract

actions at next level

Page 19: DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March 2001 1 Adaptive Intelligent Mobile Robotics Leslie Pack Kaelbling Artificial Intelligence

DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March 2001 19

Choosing Macros

Given a choice of a region, what is a good set of macro actions for traversing it?

• existing approaches guarantee optimality with a number of macros exponential in the number of exit states

• our method is approximate, but works well when here are no large rewards inside the region

Page 20: DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March 2001 1 Adaptive Intelligent Mobile Robotics Leslie Pack Kaelbling Artificial Intelligence

DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March 2001 20

Point-Source Rewards

• Compute a value function for each possible exit state, offline

• Given a new valuation of all exit states online• Quickly combine value functions to determine

near-optimal action

Page 21: DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March 2001 1 Adaptive Intelligent Mobile Robotics Leslie Pack Kaelbling Artificial Intelligence

DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March 2001 21

Approximation is Good

0

100

200

300

400

500

600

700

0 500 1000

Distance between point sources

Value

OptimalPoint Source

Page 22: DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March 2001 1 Adaptive Intelligent Mobile Robotics Leslie Pack Kaelbling Artificial Intelligence

DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March 2001 22

How to Use the Hierarchy

Off line:• Decompose environment into abstract states• Compute macro operators

On line:• Given new goal, assign values to exits at highest

level• Propagate values at each level• In current low-level region, choose action

Page 23: DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March 2001 1 Adaptive Intelligent Mobile Robotics Leslie Pack Kaelbling Artificial Intelligence

DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March 2001 23

What Makes a Decomposition Good?

Trade off• decrease in off-line planning time• decrease in on-line planning time• decrease in value of actions

We can articulate this criterion formally but…

… we can’t solve it

Current research on reasonable approximations

Page 24: DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March 2001 1 Adaptive Intelligent Mobile Robotics Leslie Pack Kaelbling Artificial Intelligence

DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March 2001 24

Next Steps

Low-level• apply JAQL to tune obstacle avoidance behaviors

Map learning• landmark selection and representation• visual detection of openings

Hierarchy• algorithm for constructing decomposition• test hierarchical planning on huge simulated

domain