December 4, Project

Combing Reactive and Deliberative Algorithms

CSCI7000: Final Presentation

Maciej Stachura

Dec. 4, 2009

Outline

• Project Overview

• Positioning System

• Hardware Demo

Project Goals

• Combine deliberative and reactive algorithms

• Show stability and completeness

• Demonstrate multi-robot coverage on iCreate robots.

Coverage Problem

• Cover Entire Area

• Deliberative Algorithm Plans Next Point to visit.

• Reactive Algorithm pushes robot to that point.

• Reactive Algorithm Adds 2 constraints:

• Maintain Communication Distance

• Collision Avoidance

Proof of Stability

also,

error decays

Therefore stable system.

Demo for single vehicle

• Implimented on iCreate.

• 5 points to visit.

• Deliberative Algorithm Selects Point.

• Reactive Algorithm uses potential field to reach point.

• Point reached when within some minimum distance.

VIDEO

Multi-robot Case

• 2 Robot Coverage

• Blue is free to move

• Green must stay in communication range.

• Matlab Simulation.

VIDEO

Outline



• Hardware Demo

Positioning System

• Problems with Stargazer.• Periods of no measurement

• Occasional Bad Measurements

• State Estimation (SPF)• Combine Stargazer with Odometry

• Reject Bad Measurements

SPF Explanation

• Sigma Point Filter uses Stargazer and Odometry measures to predict robot position.

• Non-guassian Noise

• Implimented and Tested on robot platform.

• Performs very well in the presence of no measurements or bad measurement.

Outline



• Hardware Demo

Roomba Pac-Man

• Implimented 5 Robot Demo along

with Jack Elston.

• Re-creation of Pac-Man Game.

• Demonstrate NetUAS system.

• Showcase most of concepts

from class.

Video

Roomba Pac-Man

• Reactive Algorithms:• Walls of maze

• Potential Field

• Deliberative Algorithms• Ghost Planning (Enumerate States)


• Game modes

• Decentralized• Each ghost ran planning algorithm

• Collaborated on positions

• Communication• 802.11b Ad-hoc Network

• AODV, no centralized node

Roomba Pac-Man


• Potential Field



• Game modes





Roomba Pac-Man


• Potential Field



• Game modes





Roomba Pac-Man


• Potential Field



• Game modes





Roomba Pac-Man

• Simulation• Multi-threaded Sim. Of Robots

• Combine Software with Hardware

• Probabilistic Modelling• Sigma Point Filter

• Human/Robot Interaction• Limited Human Control of Pac-Man

• Autonomous Ghosts

• Hardware Implimentation• SBC's running Gentoo

• Experimental Verification

Roomba Pac-Man








Roomba Pac-Man








Roomba Pac-Man








Left to Do

• Impliment inter-robot potential field.

• Conduct Experiments

• Generalize Theory?

End

Questions?

http://pacman.elstonj.com

A Gradient Based Approach

Greg Brown

  Introduction   Robot State Machine   Gradients for “Grasping” the Object   Gradient for Moving the Object   Convergence Simulation Results   Continuing Work

Place a single beacon on an object and another at the object’s destination. Multiple robots cooperate to move the object.

Goals:   Minimal/No Robot Communication   Object has an Unknown Geometry   Use Gradients for Reactive Navigation

  Each Robot Knows: ◦  Distance/Direction to Object ◦  Distance/Direction to Destination ◦  Distance/Direction to All Other Robots ◦  Bumper Sensor to Detect Collision

  Robots Do Not Know ◦  Object Geometry ◦  Actions other Robots are taking

  Related “Grasping” Work: ◦  Grasping with hand – Maximize torque [Liu et al] ◦  Cage objects for pushing [Fink et al] ◦  Tug Boats Manipulating Barge [Esposito] ◦  ALL require known geometry

  My Hybrid Approach ◦  Even distribution around object ◦  Alternate between Convergence and Repulsion

Gradients

◦  Similar to Cow Herding example from class.

Pull towards object:

Avoid nearby robots: €

γ = ri − robj

€

β = 1− 1+ dc4

dc4

( ri − rj2− dc

2)2

( ri − rj2− dc

2)2 +1

sign(dc − ri −rj )+1

2

j=1

N

∏

€

Cost =γ 2

(γκ c + β)1/κ c

Combined Cost Function:

Repel from all robots:

€

Cost =1

(1+ β)1/κ r

€

β = ri − rj2− dr

2

j=1

N

∏

  Related Work ◦  Formations [Tanner and Kumar] ◦  Flocking [Lindhé et al] ◦  Pushing objects [Fink et al, Esposito] ◦  No catastrophic failure if out of position.

  My Approach: ◦  Head towards destination in steps ◦  Keep close to object. ◦  Communicate “through” object ◦  Maintain orientation.

  Assuming forklift on Robot can rotate 360º

Next Step Vector:

Pull to destination:

€

γ1 = ri − rγ i€

rγ i = rideali + dmrObjCenter − rObjDestrObjCenter − rObjDest

Valley Perpendicular to Travel Vector:

€

m = −rObjCenterx − rObjDestx

rObjCentery − rObjDesty + .0001

€

γ 2 =mrix − riy −mrγ x + rγ y

(m2 +1)

€

Cost = γ1κ1γ 2

κ 2

0

10

20

30

40

50

60 52

1 67

0 82

0 96

9 11

18

1268

14

17

1566

17

15

1865

20

14

2163

23

13

2462

26

11

2761

29

10

3059

32

08

3358

35

07

3656

38

06

3955

41

04

4254

44

03

4552

47

01

4851

50

00

Num

ber o

f Occ

uren

ces

Time Steps

3 Bots

4 Bots

5 Bots

6 Bots

  Resolve Convergence Problems   Noise in Sensing   Noise in Actuation

0

10

20

30

40

50

60 24

5 40

4 56

2 72

1 87

9 10

38

1196

13

55

1513

16

72

1830

19

89

2147

23

06

2464

26

23

2781

29

40

3098

32

57

3415

35

74

3732

38

91

4049

42

08

4366

45

25

4683

48

42

5000

Num

ber o

f Occ

uren

ces

Time Steps

3 Bots

4 Bots

5 Bots

6 Bots

Modular RobotsLearning

ContributionsConclusion

A Young Modular Robot’s Guide to Locomotion

Ben Pearre

Computer Science

University of Colorado at Boulder, USA

December 6, 2009

Ben Pearre A Young Modular Robot’s Guide to Locomotion



Outline

Modular Robots

LearningThe ProblemThe Policy GradientDomain Knowledge

ContributionsGoing forwardSteeringCurriculum Development

Conclusion




Modular Robots

How to get these to move?




The ProblemThe Policy GradientDomain Knowledge

The Learning Problem

Given unknown sensations and actions, learn a task:

◮ Sensations s ∈ Rn

◮ State x ∈ Rd

◮ Action u ∈ Rp

◮ Reward r ∈ R

◮ Policy π(x , θ) = Pr(u|x , θ) : R|θ| × R

|u|

Example policy:

u(x , θ) = θ0 +∑

i

θi (x − bi )TDi (x − bi ) + N (0, σ)

What does that mean for locomotion?





Policy Gradient Reinforcement Learning: Finite Difference

Vary θ:

◮ Measure performance J0 of π(θ)

◮ Measure performance J1...n of π(θ + ∆1...nθ)

◮ Solve regression, move θ along gradient.

gradient =(

∆ΘT∆Θ)−1

∆ΘT J

where ∆Θ =

∆θ1

...∆θn

and J =

J1 − J0

...Jn − J0





Policy Gradient Reinforcement Learning: Likelihood Ratio

Vary u:

◮ Measure performance J(π(θ)) of π(θ) with noise. . .

◮ Compute log-probability of generated trajectory Pr(τ |θ)

Gradient =

⟨(

H∑

k=0

∇θ log πθ(uk |xk)

)(

H∑

l=0

rl

)⟩





Why is RL slow?

“Curse of Dimensionality”

◮ Exploration

◮ Learning rate

◮ Domain representation

◮ Policy representation

◮ Over- and under-actuation

◮ Domain knowledge





Domain Knowledge

Infinite space of policies to explore.

◮ RL is model-free. So what?

◮ Representation is bias.

◮ Bias search towards “good” solutions

◮ Learn all of physics. . . and apply it?

◮ Previous experience in this domain?

◮ Policy implemented by <programmer, agent> “autonomous”?

How would knowledge of this domain help?





Dimensionality Reduction

Task learning as domain-knowledge acquisition:

◮ Experience with a domain

◮ Skill at completing some task

◮ Skill at completing some set of tasks?

◮ Taskspace Manifold




Going forwardSteeringCurriculum Development

Goals

1. Apply PGRL to a new domain.

2. Learn mapping from task manifold to policy manifold.

3. Robot school?





1: Learning to locomote

◮ Sensors: Force feedback onservos? Or not.

◮ Policy: u ∈ R8 controls

servos

ui = N (θi , σ)

◮ Reward: forward speed

◮ Domain knowledge: none

Demo?





1: Learning to locomote

0 500 1000 1500 2000 2500−0.1

0

0.1

0.2

0.3

0.4

s

v

0 500 1000 1500 2000 2500−10

−5

0

5

10

s

θ

Learning to move

steer bow

steer stern

bow

port fwd

stbd fwd

port aft

stbd aft

stern

effort10−step forward speed





2: Learning to get to a target

◮ Sensors: Bearing to goal.

◮ Policy: u ∈ R8 controls servos

◮ Policy parameters: θ ∈ R16

µi (x , θ) = θi · s (1)

= [ θi ,0 θi ,1 ]

[

1φ

]

(2)

ui = N (µi , σ) (3)

∇θilog π(x , θ) =

1

σ2(ui − θi · s) · s (4)





2: Task space → policy space

◮ 16-DOF learning FAIL!

◮ Try simpler task:◮ Learn to locomote with

θ ∈ R16

◮ Try bootstrapping:

1. Learn to locomote with 8DOF

2. Add new sensing andcontrol DOF

◮ CHEATING! Why?

0 20 40 60 80 100 12050

100

150

200

250

300Time to complete task

task

seco

nds





Curriculum development for manifold discovery?

◮ Etude in Locomotion◮ Task-space manifold for locomotion

θ ∈ ξ · [ 0 0 1 −1 1 −1 1 1 ]T

◮ Stop exploring in task nullspace◮ FAST!

◮ Etude in Steering◮ Can task be completed on locomotion manifold?◮ One possible approximate solution uses the bases

[

0 0 1 −1 1 −1 1 11 −1 0 0 0 0 0 0

]T

◮ Can second basis be learned?





3: How to teach a robot?

How to teach an animal?

1. Reward basic skills

2. Develop control along useful DOFs

3. Make skill more complex

4. A good solution NOW!




Conclusion

Exorcising the Curse of Dimensionality

◮ PGRL works for low-DOF problems.

◮ Task-space dimension < state-space dimension.

◮ Learn f: task-space manifold → policy-space manifold.


Technology

December 4, Project