View
338
Download
0
Tags:
Embed Size (px)
Citation preview
Combing Reactive and Deliberative Algorithms
CSCI7000: Final Presentation
Maciej Stachura
Dec. 4, 2009
Outline
• Project Overview
• Positioning System
• Hardware Demo
Project Goals
• Combine deliberative and reactive algorithms
• Show stability and completeness
• Demonstrate multi-robot coverage on iCreate robots.
Coverage Problem
• Cover Entire Area
• Deliberative Algorithm Plans Next Point to visit.
• Reactive Algorithm pushes robot to that point.
• Reactive Algorithm Adds 2 constraints:
• Maintain Communication Distance
• Collision Avoidance
Proof of Stability
also,
error decays
Therefore stable system.
Demo for single vehicle
• Implimented on iCreate.
• 5 points to visit.
• Deliberative Algorithm Selects Point.
• Reactive Algorithm uses potential field to reach point.
• Point reached when within some minimum distance.
VIDEO
Multi-robot Case
• 2 Robot Coverage
• Blue is free to move
• Green must stay in communication range.
• Matlab Simulation.
VIDEO
Outline
• Project Overview
• Positioning System
• Hardware Demo
Positioning System
• Problems with Stargazer.• Periods of no measurement
• Occasional Bad Measurements
• State Estimation (SPF)• Combine Stargazer with Odometry
• Reject Bad Measurements
SPF Explanation
• Sigma Point Filter uses Stargazer and Odometry measures to predict robot position.
• Non-guassian Noise
• Implimented and Tested on robot platform.
• Performs very well in the presence of no measurements or bad measurement.
Outline
• Project Overview
• Positioning System
• Hardware Demo
Roomba Pac-Man
• Implimented 5 Robot Demo along
with Jack Elston.
• Re-creation of Pac-Man Game.
• Demonstrate NetUAS system.
• Showcase most of concepts
from class.
Video
Roomba Pac-Man
• Reactive Algorithms:• Walls of maze
• Potential Field
• Deliberative Algorithms• Ghost Planning (Enumerate States)
• Collision Avoidance
• Game modes
• Decentralized• Each ghost ran planning algorithm
• Collaborated on positions
• Communication• 802.11b Ad-hoc Network
• AODV, no centralized node
Roomba Pac-Man
• Reactive Algorithms:• Walls of maze
• Potential Field
• Deliberative Algorithms• Ghost Planning (Enumerate States)
• Collision Avoidance
• Game modes
• Decentralized• Each ghost ran planning algorithm
• Collaborated on positions
• Communication• 802.11b Ad-hoc Network
• AODV, no centralized node
Roomba Pac-Man
• Reactive Algorithms:• Walls of maze
• Potential Field
• Deliberative Algorithms• Ghost Planning (Enumerate States)
• Collision Avoidance
• Game modes
• Decentralized• Each ghost ran planning algorithm
• Collaborated on positions
• Communication• 802.11b Ad-hoc Network
• AODV, no centralized node
Roomba Pac-Man
• Reactive Algorithms:• Walls of maze
• Potential Field
• Deliberative Algorithms• Ghost Planning (Enumerate States)
• Collision Avoidance
• Game modes
• Decentralized• Each ghost ran planning algorithm
• Collaborated on positions
• Communication• 802.11b Ad-hoc Network
• AODV, no centralized node
Roomba Pac-Man
• Simulation• Multi-threaded Sim. Of Robots
• Combine Software with Hardware
• Probabilistic Modelling• Sigma Point Filter
• Human/Robot Interaction• Limited Human Control of Pac-Man
• Autonomous Ghosts
• Hardware Implimentation• SBC's running Gentoo
• Experimental Verification
Roomba Pac-Man
• Simulation• Multi-threaded Sim. Of Robots
• Combine Software with Hardware
• Probabilistic Modelling• Sigma Point Filter
• Human/Robot Interaction• Limited Human Control of Pac-Man
• Autonomous Ghosts
• Hardware Implimentation• SBC's running Gentoo
• Experimental Verification
Roomba Pac-Man
• Simulation• Multi-threaded Sim. Of Robots
• Combine Software with Hardware
• Probabilistic Modelling• Sigma Point Filter
• Human/Robot Interaction• Limited Human Control of Pac-Man
• Autonomous Ghosts
• Hardware Implimentation• SBC's running Gentoo
• Experimental Verification
Roomba Pac-Man
• Simulation• Multi-threaded Sim. Of Robots
• Combine Software with Hardware
• Probabilistic Modelling• Sigma Point Filter
• Human/Robot Interaction• Limited Human Control of Pac-Man
• Autonomous Ghosts
• Hardware Implimentation• SBC's running Gentoo
• Experimental Verification
Left to Do
• Impliment inter-robot potential field.
• Conduct Experiments
• Generalize Theory?
End
Questions?
http://pacman.elstonj.com
A Gradient Based Approach
Greg Brown
Introduction Robot State Machine Gradients for “Grasping” the Object Gradient for Moving the Object Convergence Simulation Results Continuing Work
Place a single beacon on an object and another at the object’s destination. Multiple robots cooperate to move the object.
Goals: Minimal/No Robot Communication Object has an Unknown Geometry Use Gradients for Reactive Navigation
Each Robot Knows: ◦ Distance/Direction to Object ◦ Distance/Direction to Destination ◦ Distance/Direction to All Other Robots ◦ Bumper Sensor to Detect Collision
Robots Do Not Know ◦ Object Geometry ◦ Actions other Robots are taking
Related “Grasping” Work: ◦ Grasping with hand – Maximize torque [Liu et al] ◦ Cage objects for pushing [Fink et al] ◦ Tug Boats Manipulating Barge [Esposito] ◦ ALL require known geometry
My Hybrid Approach ◦ Even distribution around object ◦ Alternate between Convergence and Repulsion
Gradients
◦ Similar to Cow Herding example from class.
Pull towards object:
Avoid nearby robots: €
γ = ri − robj
€
β = 1− 1+ dc4
dc4
( ri − rj2− dc
2)2
( ri − rj2− dc
2)2 +1
sign(dc − ri −rj )+1
2
j=1
N
∏
€
Cost =γ 2
(γκ c + β)1/κ c
Combined Cost Function:
Repel from all robots:
€
Cost =1
(1+ β)1/κ r
€
β = ri − rj2− dr
2
j=1
N
∏
Related Work ◦ Formations [Tanner and Kumar] ◦ Flocking [Lindhé et al] ◦ Pushing objects [Fink et al, Esposito] ◦ No catastrophic failure if out of position.
My Approach: ◦ Head towards destination in steps ◦ Keep close to object. ◦ Communicate “through” object ◦ Maintain orientation.
Assuming forklift on Robot can rotate 360º
Next Step Vector:
Pull to destination:
€
γ1 = ri − rγ i€
rγ i = rideali + dmrObjCenter − rObjDestrObjCenter − rObjDest
Valley Perpendicular to Travel Vector:
€
m = −rObjCenterx − rObjDestx
rObjCentery − rObjDesty + .0001
€
γ 2 =mrix − riy −mrγ x + rγ y
(m2 +1)
€
Cost = γ1κ1γ 2
κ 2
0
10
20
30
40
50
60 52
1 67
0 82
0 96
9 11
18
1268
14
17
1566
17
15
1865
20
14
2163
23
13
2462
26
11
2761
29
10
3059
32
08
3358
35
07
3656
38
06
3955
41
04
4254
44
03
4552
47
01
4851
50
00
Num
ber o
f Occ
uren
ces
Time Steps
3 Bots
4 Bots
5 Bots
6 Bots
Resolve Convergence Problems Noise in Sensing Noise in Actuation
0
10
20
30
40
50
60 24
5 40
4 56
2 72
1 87
9 10
38
1196
13
55
1513
16
72
1830
19
89
2147
23
06
2464
26
23
2781
29
40
3098
32
57
3415
35
74
3732
38
91
4049
42
08
4366
45
25
4683
48
42
5000
Num
ber o
f Occ
uren
ces
Time Steps
3 Bots
4 Bots
5 Bots
6 Bots
Modular RobotsLearning
ContributionsConclusion
A Young Modular Robot’s Guide to Locomotion
Ben Pearre
Computer Science
University of Colorado at Boulder, USA
December 6, 2009
Ben Pearre A Young Modular Robot’s Guide to Locomotion
Modular RobotsLearning
ContributionsConclusion
Outline
Modular Robots
LearningThe ProblemThe Policy GradientDomain Knowledge
ContributionsGoing forwardSteeringCurriculum Development
Conclusion
Ben Pearre A Young Modular Robot’s Guide to Locomotion
Modular RobotsLearning
ContributionsConclusion
Modular Robots
How to get these to move?
Ben Pearre A Young Modular Robot’s Guide to Locomotion
Modular RobotsLearning
ContributionsConclusion
The ProblemThe Policy GradientDomain Knowledge
The Learning Problem
Given unknown sensations and actions, learn a task:
◮ Sensations s ∈ Rn
◮ State x ∈ Rd
◮ Action u ∈ Rp
◮ Reward r ∈ R
◮ Policy π(x , θ) = Pr(u|x , θ) : R|θ| × R
|u|
Example policy:
u(x , θ) = θ0 +∑
i
θi (x − bi )TDi (x − bi ) + N (0, σ)
What does that mean for locomotion?
Ben Pearre A Young Modular Robot’s Guide to Locomotion
Modular RobotsLearning
ContributionsConclusion
The ProblemThe Policy GradientDomain Knowledge
Policy Gradient Reinforcement Learning: Finite Difference
Vary θ:
◮ Measure performance J0 of π(θ)
◮ Measure performance J1...n of π(θ + ∆1...nθ)
◮ Solve regression, move θ along gradient.
gradient =(
∆ΘT∆Θ)−1
∆ΘT J
where ∆Θ =
∆θ1
...∆θn
and J =
J1 − J0
...Jn − J0
Ben Pearre A Young Modular Robot’s Guide to Locomotion
Modular RobotsLearning
ContributionsConclusion
The ProblemThe Policy GradientDomain Knowledge
Policy Gradient Reinforcement Learning: Likelihood Ratio
Vary u:
◮ Measure performance J(π(θ)) of π(θ) with noise. . .
◮ Compute log-probability of generated trajectory Pr(τ |θ)
Gradient =
⟨(
H∑
k=0
∇θ log πθ(uk |xk)
)(
H∑
l=0
rl
)⟩
Ben Pearre A Young Modular Robot’s Guide to Locomotion
Modular RobotsLearning
ContributionsConclusion
The ProblemThe Policy GradientDomain Knowledge
Why is RL slow?
“Curse of Dimensionality”
◮ Exploration
◮ Learning rate
◮ Domain representation
◮ Policy representation
◮ Over- and under-actuation
◮ Domain knowledge
Ben Pearre A Young Modular Robot’s Guide to Locomotion
Modular RobotsLearning
ContributionsConclusion
The ProblemThe Policy GradientDomain Knowledge
Domain Knowledge
Infinite space of policies to explore.
◮ RL is model-free. So what?
◮ Representation is bias.
◮ Bias search towards “good” solutions
◮ Learn all of physics. . . and apply it?
◮ Previous experience in this domain?
◮ Policy implemented by <programmer, agent> “autonomous”?
How would knowledge of this domain help?
Ben Pearre A Young Modular Robot’s Guide to Locomotion
Modular RobotsLearning
ContributionsConclusion
The ProblemThe Policy GradientDomain Knowledge
Dimensionality Reduction
Task learning as domain-knowledge acquisition:
◮ Experience with a domain
◮ Skill at completing some task
◮ Skill at completing some set of tasks?
◮ Taskspace Manifold
Ben Pearre A Young Modular Robot’s Guide to Locomotion
Modular RobotsLearning
ContributionsConclusion
Going forwardSteeringCurriculum Development
Goals
1. Apply PGRL to a new domain.
2. Learn mapping from task manifold to policy manifold.
3. Robot school?
Ben Pearre A Young Modular Robot’s Guide to Locomotion
Modular RobotsLearning
ContributionsConclusion
Going forwardSteeringCurriculum Development
1: Learning to locomote
◮ Sensors: Force feedback onservos? Or not.
◮ Policy: u ∈ R8 controls
servos
ui = N (θi , σ)
◮ Reward: forward speed
◮ Domain knowledge: none
Demo?
Ben Pearre A Young Modular Robot’s Guide to Locomotion
Modular RobotsLearning
ContributionsConclusion
Going forwardSteeringCurriculum Development
1: Learning to locomote
0 500 1000 1500 2000 2500−0.1
0
0.1
0.2
0.3
0.4
s
v
0 500 1000 1500 2000 2500−10
−5
0
5
10
s
θ
Learning to move
steer bow
steer stern
bow
port fwd
stbd fwd
port aft
stbd aft
stern
effort10−step forward speed
Ben Pearre A Young Modular Robot’s Guide to Locomotion
Modular RobotsLearning
ContributionsConclusion
Going forwardSteeringCurriculum Development
2: Learning to get to a target
◮ Sensors: Bearing to goal.
◮ Policy: u ∈ R8 controls servos
◮ Policy parameters: θ ∈ R16
µi (x , θ) = θi · s (1)
= [ θi ,0 θi ,1 ]
[
1φ
]
(2)
ui = N (µi , σ) (3)
∇θilog π(x , θ) =
1
σ2(ui − θi · s) · s (4)
Ben Pearre A Young Modular Robot’s Guide to Locomotion
Modular RobotsLearning
ContributionsConclusion
Going forwardSteeringCurriculum Development
2: Task space → policy space
◮ 16-DOF learning FAIL!
◮ Try simpler task:◮ Learn to locomote with
θ ∈ R16
◮ Try bootstrapping:
1. Learn to locomote with 8DOF
2. Add new sensing andcontrol DOF
◮ CHEATING! Why?
0 20 40 60 80 100 12050
100
150
200
250
300Time to complete task
task
seco
nds
Ben Pearre A Young Modular Robot’s Guide to Locomotion
Modular RobotsLearning
ContributionsConclusion
Going forwardSteeringCurriculum Development
Curriculum development for manifold discovery?
◮ Etude in Locomotion◮ Task-space manifold for locomotion
θ ∈ ξ · [ 0 0 1 −1 1 −1 1 1 ]T
◮ Stop exploring in task nullspace◮ FAST!
◮ Etude in Steering◮ Can task be completed on locomotion manifold?◮ One possible approximate solution uses the bases
[
0 0 1 −1 1 −1 1 11 −1 0 0 0 0 0 0
]T
◮ Can second basis be learned?
Ben Pearre A Young Modular Robot’s Guide to Locomotion
Modular RobotsLearning
ContributionsConclusion
Going forwardSteeringCurriculum Development
3: How to teach a robot?
How to teach an animal?
1. Reward basic skills
2. Develop control along useful DOFs
3. Make skill more complex
4. A good solution NOW!
Ben Pearre A Young Modular Robot’s Guide to Locomotion
Modular RobotsLearning
ContributionsConclusion
Conclusion
Exorcising the Curse of Dimensionality
◮ PGRL works for low-DOF problems.
◮ Task-space dimension < state-space dimension.
◮ Learn f: task-space manifold → policy-space manifold.
Ben Pearre A Young Modular Robot’s Guide to Locomotion