View
216
Download
0
Category
Tags:
Preview:
Citation preview
04/19/23 1DARPA-MARS Kickoff
Adaptive Intelligent Mobile Robots
Leslie Pack Kaelbling
Artificial Intelligence Laboratory
MIT
04/19/23 2DARPA-MARS Kickoff
Two projects
Making reinforcement learning work on real robots
Solving huge problems dynamic problem reformulation explicit uncertainty management
04/19/23 3DARPA-MARS Kickoff
Reinforcement learning
given a connection to the environment find a behavior that maximizes long-run
reinforcement
Reinf
Environment
ActionObservation
04/19/23 4DARPA-MARS Kickoff
Why reinforcement learning?
Unknown or changing environments
Easier for human to provide reinforcement function than whole behavior
04/19/23 5DARPA-MARS Kickoff
Q-Learning
Learn to choose actions because of their long-term consequences
Given experience:
Given a state s , take the action a that maximizes
s,a,r, ′ s
Q(s,a) =(1−α)Q(s,a) +α(r +γmax′ aQ( ′ s , ′ a ))
Q(s,a)
04/19/23 6DARPA-MARS Kickoff
Does it Work?
Yes and no.
Successes in simulated domains: backgammon, elevator scheduling
Successes in manufacturing and juggling with strong constraints
No strong successes in more general online robotic learning
04/19/23 7DARPA-MARS Kickoff
Why is RL on robots hard?
Need fast, robust supervised learning
Continuous input and action spaces
Q-learning slow to propagate values
Need strong exploration bias
04/19/23 8DARPA-MARS Kickoff
Making RL on robots easier
Need fast, robust supervised learning locally weighted regression
Continuous input and action spaces search and caching of optimal action
Q-learning slow to propagate values model-based acceleration
Need strong exploration bias start with human-supplied policy
04/19/23 9DARPA-MARS Kickoff
HumanPolicy
Start with human-provided policy
Environment
action
state
04/19/23 10DARPA-MARS Kickoff
Do supervised policy learning
HumanPolicy
Train
Environment
Policy
action
state
s a
04/19/23 11DARPA-MARS Kickoff
When the policy is learned, let it drive
HumanPolicy
Train
Environment
Policyaction
state
04/19/23 12DARPA-MARS Kickoff
Q-LearningTrain
Environment
Q-Value
RL
Policyaction
state
D
sa
v
04/19/23 13DARPA-MARS Kickoff
Acting based on Q values
Q-Value
Q-Value
Q-Value
maxindex
a1
a2
an
a
s
04/19/23 14DARPA-MARS Kickoff
Letting the Q-learner driveTrain
Environment
RL
Policyaction
state
D
Q-Valuesa
v
max
04/19/23 15DARPA-MARS Kickoff
Train policy with max Q valuesTrain
Environment
RL
Policyaction
state
D
Q-Valuesa
v
max
s’
04/19/23 16DARPA-MARS Kickoff
Add model learningTrain
Train
Model
Environment
Q-Value
RL
Policyaction
state
D
s
s s
a
a r
v
04/19/23 17DARPA-MARS Kickoff
Train
Train
Model
Environment
Q-Value
RL
Policyaction
state
D
sa
v
When model is good, train Q with it
s’
a’
04/19/23 18DARPA-MARS Kickoff
Other forms of human knowledge
hard safety constraints on action choices partial models or constraints on models value estimates or value orderings on states
04/19/23 19DARPA-MARS Kickoff
We will have succeeded if
It takes less human effort and total development time to provide prior knowledge run and tune the learning algorithm
than to write and debug the program without learning
04/19/23 20DARPA-MARS Kickoff
Test domain
Indoor mobile-robot navigation and delivery tasks
quick adaptation to new buildings
quick adaptation to sensor change or failure
quick incorporation of human information
04/19/23 21DARPA-MARS Kickoff
Solving huge problems
We have lots of good techniques for small-to-medium sized problems
reinforcement learning probabilistic planning Bayesian inference
Rather than scale them to tackle huge problems directly, formulate right-sized problems on the fly
04/19/23 22DARPA-MARS Kickoff
Dynamic problem reformulation
workingmemory
perception action
04/19/23 23DARPA-MARS Kickoff
Reformulation strategy
Dynamically swap variables in and out of working memory
constant sized problem always tractable adapt to changing situations, goals, etc
Given more time pressure, decrease problem size
Given less time pressure, increase problem size
04/19/23 24DARPA-MARS Kickoff
Multiple-resolution plans
Fine view of near-term high-probability eventsCoarse view of distant low-probability events
04/19/23 25DARPA-MARS Kickoff
Information gathering
Explicit models of the robot’s uncertainty allow information gathering actions
drive to top of hill for better view open a door to see what’s inside ask a human for guidance
Where is the supply depot?
Two miles up this road
04/19/23 26DARPA-MARS Kickoff
Explicit uncertainty modeling
POMDP work gives us theoretical understanding
Derive practical solutions from learning explicit memorization policies approximating optimal control
04/19/23 27DARPA-MARS Kickoff
Huge-domain experiments
Simulation of very complex task environment large number of buildings and other geographical
structures concurrent, competing tasks such as
surveillance supply delivery self-preservation
other agents from whom information can be gathered
Recommended