33
Learning in Heuristic Search-based Planning Maxim Likhachev Associate Professor Robotics Institute/NREC Carnegie Mellon University Search-based Planning Lab (SBPL) Joint work with Ishani Chatterjee, Ben Cohen, Andrew Dornbush, Victor Hwang, Venkatraman Narayanan, Michael Phillips, Kalyan Vasudev

Learning in Heuristic Search-based Planning...Learning in Search-based Planning Carnegie Mellon University 5 Speeding up planning Learning cost function Going beyond the prior model

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Learning in Heuristic Search-based Planning...Learning in Search-based Planning Carnegie Mellon University 5 Speeding up planning Learning cost function Going beyond the prior model

Learning

in

Heuristic Search-based Planning

Maxim Likhachev

Associate Professor

Robotics Institute/NREC

Carnegie Mellon University

Search-based Planning Lab (SBPL)

Joint work with Ishani Chatterjee, Ben Cohen, Andrew Dornbush, Victor Hwang,

Venkatraman Narayanan, Michael Phillips, Kalyan Vasudev

Page 2: Learning in Heuristic Search-based Planning...Learning in Search-based Planning Carnegie Mellon University 5 Speeding up planning Learning cost function Going beyond the prior model

Maxim LikhachevMaxim Likhachev

Going into the Real-world

• Robot models and simple world interactions can be pre-encoded

• Planning on those models enables the robots to operate under

benign/narrow conditions right away

• Real-world: real-time + going beyond what’s given

Carnegie Mellon University 2

Waseda/Mitsubishi robot

Page 3: Learning in Heuristic Search-based Planning...Learning in Search-based Planning Carnegie Mellon University 5 Speeding up planning Learning cost function Going beyond the prior model

Maxim LikhachevMaxim Likhachev

Learning in Search-based Planning

Carnegie Mellon University 3

Speeding up

planning

Learning

cost function

Going beyond

the prior model

Waseda/

Mitsubishi

Re-use of previous results within search (Phillips et al.,’12; Islam et al.,‘18)

Learning heuristic functions (Bhardwaj et al.,’17; Paden & Frazzoli,’17; Thayer et al.,’11)

Learning order of expansions (Choudhary et al.,’17)

Page 4: Learning in Heuristic Search-based Planning...Learning in Search-based Planning Carnegie Mellon University 5 Speeding up planning Learning cost function Going beyond the prior model

Maxim LikhachevMaxim Likhachev

Learning in Search-based Planning

Carnegie Mellon University 4

Speeding up

planning

Learning

cost function

Going beyond

the prior model

Learning a cost function from demonstrations (Ratliff et al.,’09; Wulfmeier et al.,’17)

Crusher (from Ratliff et a., ‘09 paper)

Page 5: Learning in Heuristic Search-based Planning...Learning in Search-based Planning Carnegie Mellon University 5 Speeding up planning Learning cost function Going beyond the prior model

Maxim LikhachevMaxim Likhachev

Learning in Search-based Planning

Carnegie Mellon University 5

Speeding up

planning

Learning

cost function

Going beyond

the prior model

Learning additional dimensions to reason over (Phillips et al.,’13)

Combining learned skills and prior model (Vasudev et al., ongoing)

Page 6: Learning in Heuristic Search-based Planning...Learning in Search-based Planning Carnegie Mellon University 5 Speeding up planning Learning cost function Going beyond the prior model

Maxim LikhachevMaxim Likhachev

Learning in Search-based Planning

Carnegie Mellon University 6

Speeding up

planning

Learning

cost function

Going beyond

the prior model

Waseda/

Mitsubishi

Re-use of previous results within search (Phillips et al.,’12; Islam et al.,’18)

Learning heuristic functions (Bhardwaj et al.,’17; Paden & Frazzoli,’17; Thayer et al.,’11)

Learning order of expansions (Choudhary et al.,’17)

Page 7: Learning in Heuristic Search-based Planning...Learning in Search-based Planning Carnegie Mellon University 5 Speeding up planning Learning cost function Going beyond the prior model

Maxim LikhachevMaxim Likhachev

Experience Graphs [Phillips et al., RSS’12]

• Many planning tasks are repetitive

- loading a dishwasher

- opening doors

- moving objects around a warehouse

- …

• Can we re-use prior experience to

accelerate planning, in the context of

search-based planning?

• Especially useful for high-dimensional

problems such as mobile manipulation!

Carnegie Mellon University 7

Page 8: Learning in Heuristic Search-based Planning...Learning in Search-based Planning Carnegie Mellon University 5 Speeding up planning Learning cost function Going beyond the prior model

Maxim LikhachevMaxim Likhachev

Experience Graphs [Phillips et al., RSS’12]

Given a set of previous paths (experiences)…

Carnegie Mellon University 8

Page 9: Learning in Heuristic Search-based Planning...Learning in Search-based Planning Carnegie Mellon University 5 Speeding up planning Learning cost function Going beyond the prior model

Maxim LikhachevMaxim Likhachev

Experience Graphs [Phillips et al., RSS’12]

Put them together into an E-graph (Experience graph)

Carnegie Mellon University 9

Page 10: Learning in Heuristic Search-based Planning...Learning in Search-based Planning Carnegie Mellon University 5 Speeding up planning Learning cost function Going beyond the prior model

Maxim LikhachevMaxim Likhachev

Given a new planning query…

Carnegie Mellon University 10

Experience Graphs [Phillips et al., RSS’12]

Page 11: Learning in Heuristic Search-based Planning...Learning in Search-based Planning Carnegie Mellon University 5 Speeding up planning Learning cost function Going beyond the prior model

Maxim LikhachevMaxim Likhachev

…would like to re-use E-graph to speed up planning in similar situations

goal

start

Carnegie Mellon University 11

Experience Graphs [Phillips et al., RSS’12]

Page 12: Learning in Heuristic Search-based Planning...Learning in Search-based Planning Carnegie Mellon University 5 Speeding up planning Learning cost function Going beyond the prior model

Maxim LikhachevMaxim Likhachev

…would like to re-use E-graph to speed up planning in similar situations

goal

start

Carnegie Mellon University 12

Re-use is via focusing search with a recomputed hε() heuristic function:

Experience Graphs [Phillips et al., RSS’12]

Page 13: Learning in Heuristic Search-based Planning...Learning in Search-based Planning Carnegie Mellon University 5 Speeding up planning Learning cost function Going beyond the prior model

Maxim LikhachevMaxim Likhachev

…would like to re-use E-graph to speed up planning in similar situations

goal

start

Carnegie Mellon University 13

Re-use is via focusing search with a recomputed hε() heuristic function:

Experience Graphs [Phillips et al., RSS’12]

General idea:Instead of biasing the search towards the goal, heuristics hε(s) biases it towards a set of paths in Experience Graph

General idea:Instead of biasing the search towards the goal, heuristics hε(s) biases it towards a set of paths in Experience Graph

Page 14: Learning in Heuristic Search-based Planning...Learning in Search-based Planning Carnegie Mellon University 5 Speeding up planning Learning cost function Going beyond the prior model

Maxim LikhachevMaxim Likhachev

…would like to re-use E-graph to speed up planning in similar situations

goal

start

Carnegie Mellon University 14

Re-use is via focusing search with a recomputed hε() heuristic function:

Experience Graphs [Phillips et al., RSS’12]

Can be computed via a single Dijkstra’s search on the Experience Graph

Can be computed via a single Dijkstra’s search on the Experience Graph

Page 15: Learning in Heuristic Search-based Planning...Learning in Search-based Planning Carnegie Mellon University 5 Speeding up planning Learning cost function Going beyond the prior model

Maxim LikhachevMaxim Likhachev

…would like to re-use E-graph to speed up planning in similar situations

goal

start

Carnegie Mellon University 16

Re-use is via focusing search with a recomputed hε() heuristic function:

heuristics hε(s) is guaranteed to be ε-consistentheuristics hε(s) is guaranteed to be ε-consistent

Experience Graphs [Phillips et al., RSS’12]

Page 16: Learning in Heuristic Search-based Planning...Learning in Search-based Planning Carnegie Mellon University 5 Speeding up planning Learning cost function Going beyond the prior model

Maxim LikhachevMaxim Likhachev

…would like to re-use E-graph to speed up planning in similar situations

goal

start

Carnegie Mellon University 17

Re-use is via focusing search with a recomputed hε() heuristic function:

Theorem 1: Algorithm is complete with respect to the original graph

Theorem 2: The cost of the solution is within a given bound on sub-optimality

Theorem 1: Algorithm is complete with respect to the original graph

Theorem 2: The cost of the solution is within a given bound on sub-optimality

Experience Graphs [Phillips et al., RSS’12]

Page 17: Learning in Heuristic Search-based Planning...Learning in Search-based Planning Carnegie Mellon University 5 Speeding up planning Learning cost function Going beyond the prior model

Maxim LikhachevMaxim Likhachev

Application of Experience Graphs

• Learning to plan faster from experience and demonstrations

Carnegie Mellon University 18

Page 18: Learning in Heuristic Search-based Planning...Learning in Search-based Planning Carnegie Mellon University 5 Speeding up planning Learning cost function Going beyond the prior model

Maxim LikhachevMaxim Likhachev

Learning in Search-based Planning

Carnegie Mellon University 19

Speeding up

planning

Learning

cost function

Going beyond

the prior model

Learning additional dimensions to reason over (Phillips et al.,’13)

Combining learned skills and prior model (Vasudev et al., ongoing)

Page 19: Learning in Heuristic Search-based Planning...Learning in Search-based Planning Carnegie Mellon University 5 Speeding up planning Learning cost function Going beyond the prior model

Maxim LikhachevMaxim Likhachev

Learning Additional Dimensions

• Learning Additional Dimensions in the Graph from Demonstrations

[Phillips et al., RSS’13]

Carnegie Mellon University 20

Page 20: Learning in Heuristic Search-based Planning...Learning in Search-based Planning Carnegie Mellon University 5 Speeding up planning Learning cost function Going beyond the prior model

Maxim LikhachevMaxim Likhachev

Learning Additional Dimensions

• Learning Additional Dimensions in the Graph from Demonstrations

[Phillips et al., RSS’13]

Carnegie Mellon University 21

Demonstrations provided in simulation; work by A. Dornbush

Page 21: Learning in Heuristic Search-based Planning...Learning in Search-based Planning Carnegie Mellon University 5 Speeding up planning Learning cost function Going beyond the prior model

Maxim LikhachevMaxim Likhachev

Learning in Search-based Planning

Carnegie Mellon University 22

Speeding up

planning

Learning

cost function

Going beyond

the prior model

Learning additional dimensions to reason over (Phillips et al.,’13)

Combining learned skills and prior model (Vasudev et al., ongoing)

Page 22: Learning in Heuristic Search-based Planning...Learning in Search-based Planning Carnegie Mellon University 5 Speeding up planning Learning cost function Going beyond the prior model

Maxim LikhachevMaxim Likhachev

Integrating Learned Skills

• Suppose:– We have a graph G = {S, E} that describes how the robot can move its base/arms

– We have a set of k skills ψi…k that include skills for pushing/pulling doors/drawer

Carnegie Mellon University 23

How skills ψi…k should be integrated with G, so that a planner can generate an overall plan?How skills ψi…k should be integrated with G, so that a planner can generate an overall plan?

Page 23: Learning in Heuristic Search-based Planning...Learning in Search-based Planning Carnegie Mellon University 5 Speeding up planning Learning cost function Going beyond the prior model

Maxim LikhachevMaxim Likhachev

Integrating Learned Skills

• Suppose:– We have a graph G = {S, E} that describes how the robot can move its base/arms

– We have a set of k skills ψi…k that include skills for pushing/pulling doors/drawer

Carnegie Mellon University 24

How skills ψi…k should be integrated with G, so that a planner can generate an overall plan?How skills ψi…k should be integrated with G, so that a planner can generate an overall plan?

skills connect disconnected

components in G

Page 24: Learning in Heuristic Search-based Planning...Learning in Search-based Planning Carnegie Mellon University 5 Speeding up planning Learning cost function Going beyond the prior model

Maxim LikhachevMaxim Likhachev

Integrating Learned Skills

• Suppose:– We have a graph G = {S, E} that describes how the robot can move its base/arms

– We have a set of k skills ψi…k that include skills for pushing/pulling doors/drawer

Carnegie Mellon University 25

How skills ψi…k should be integrated with G, so that a planner can generate an overall plan?How skills ψi…k should be integrated with G, so that a planner can generate an overall plan?

skills connect disconnected

components in G

Page 25: Learning in Heuristic Search-based Planning...Learning in Search-based Planning Carnegie Mellon University 5 Speeding up planning Learning cost function Going beyond the prior model

Maxim LikhachevMaxim Likhachev

Integrating Learned Skills

• Suppose:– We have a graph G = {S, E} that describes how the robot can move its base/arms

– We have a set of k skills ψi…k that include skills for pushing/pulling doors/drawer

Carnegie Mellon University 26

How skills ψi…k should be integrated with G, so that a planner can generate an overall plan?How skills ψi…k should be integrated with G, so that a planner can generate an overall plan?

skills connect disconnected

components in G

Page 26: Learning in Heuristic Search-based Planning...Learning in Search-based Planning Carnegie Mellon University 5 Speeding up planning Learning cost function Going beyond the prior model

Maxim LikhachevMaxim Likhachev

Integrating Learned Skills

• Suppose:– We have a graph G = {S, E} that describes how the robot can move its base/arms

– We have a set of k skills ψi…k that include skills for pushing/pulling doors/drawer

Carnegie Mellon University 27

We assume ψi: X→{a,X’}, and each X maps onto unique S

We assume ψi: X→{a,X’}, and each X maps onto unique S

A skill could potentially be available at each state S, but depending on data, at some states there is higher

confidence in its success than at others

A skill could potentially be available at each state S, but depending on data, at some states there is higher

confidence in its success than at others

Page 27: Learning in Heuristic Search-based Planning...Learning in Search-based Planning Carnegie Mellon University 5 Speeding up planning Learning cost function Going beyond the prior model

Maxim LikhachevMaxim Likhachev

Integrating Learned Skills

• Suppose:– We have a graph G = {S, E} that describes how the robot can move its base/arms

– We have a set of k skills ψi…k that include skills for pushing/pulling doors/drawer

Carnegie Mellon University 28

• If confidence is estimated (e.g., via Dropouts [Gal & Ghahramani]), then:– Option 1: cost(s,a’,s’) is inflated proportionally to the estimated confidence

Page 28: Learning in Heuristic Search-based Planning...Learning in Search-based Planning Carnegie Mellon University 5 Speeding up planning Learning cost function Going beyond the prior model

Maxim LikhachevMaxim Likhachev

Integrating Learned Skills

• Suppose:– We have a graph G = {S, E} that describes how the robot can move its base/arms

– We have a set of k skills ψi…k that include skills for pushing/pulling doors/drawer

Carnegie Mellon University 29

• If confidence is estimated (e.g., via Dropouts [Gal & Ghahramani]), then:– Option 1: cost(s,a’,s’) is inflated proportionally to the estimated confidence

Page 29: Learning in Heuristic Search-based Planning...Learning in Search-based Planning Carnegie Mellon University 5 Speeding up planning Learning cost function Going beyond the prior model

Maxim LikhachevMaxim Likhachev

Integrating Learned Skills

• Suppose:– We have a graph G = {S, E} that describes how the robot can move its base/arms

– We have a set of k skills ψi…k that include skills for pushing/pulling doors/drawer

Carnegie Mellon University 30

• If confidence is estimated (e.g., via Dropouts [Gal & Ghahramani]), then:– Option 1: cost(s,a’,s’) is inflated proportionally to the estimated confidence

– Option 2: represent the planning problem as POMDP

Page 30: Learning in Heuristic Search-based Planning...Learning in Search-based Planning Carnegie Mellon University 5 Speeding up planning Learning cost function Going beyond the prior model

Maxim LikhachevMaxim Likhachev

Integrating Learned Skills

• Suppose:– We have a graph G = {S, E} that describes how the robot can move its base/arms

– We have a set of k skills ψi…k that include skills for pushing/pulling doors/drawer

Carnegie Mellon University 31

• If confidence is estimated (e.g., via Dropouts [Gal & Ghahramani]), then:– Option 1: cost(s,a’,s’) is inflated proportionally to the estimated confidence

– Option 2: represent the planning problem as POMDP

- planning is exponential in (S, ψi) pairs

- however, there exists a clear preference on the outcomes: it is always preferred

for a skill to be successful at a given S

Page 31: Learning in Heuristic Search-based Planning...Learning in Search-based Planning Carnegie Mellon University 5 Speeding up planning Learning cost function Going beyond the prior model

Maxim LikhachevMaxim Likhachev

Integrating Learned Skills

• Suppose:– We have a graph G = {S, E} that describes how the robot can move its base/arms

– We have a set of k skills ψi…k that include skills for pushing/pulling doors/drawer

Carnegie Mellon University 32

• If confidence is estimated (e.g., via Dropouts [Gal & Ghahramani]), then:– Option 1: cost(s,a’,s’) is inflated proportionally to the estimated confidence

– Option 2: represent the planning problem as POMDP

- planning is exponential in (S, ψi) pairs

- however, there exists a clear preference on the outcomes: it is always preferred

for a skill to be successful at a given S

Planning problem can be decomposed into a series of graph searches using PPCP (Likhachev & Stentz,’09):

- avoids planning in a belief state-space- scales to large-scale problems in real-time- provides rigorous theoretical guarantees

Planning problem can be decomposed into a series of graph searches using PPCP (Likhachev & Stentz,’09):

- avoids planning in a belief state-space- scales to large-scale problems in real-time- provides rigorous theoretical guarantees

Page 32: Learning in Heuristic Search-based Planning...Learning in Search-based Planning Carnegie Mellon University 5 Speeding up planning Learning cost function Going beyond the prior model

Maxim LikhachevMaxim Likhachev

Going Forward

• Explore option 2 (POMDP planning with uncertainty due to skills)

• Relax the assumption that each X maps onto unique S

• Apply the framework to few domains including navigation through

crowded areas

Carnegie Mellon University 33

Page 33: Learning in Heuristic Search-based Planning...Learning in Search-based Planning Carnegie Mellon University 5 Speeding up planning Learning cost function Going beyond the prior model

Maxim Likhachev 34Carnegie Mellon University

Thanks!

• Students & Staff:

– Ishani Chatterjee

– Ben Cohen

– Andrew Dornbush

– Victor Hwang

– Venkatraman Narayanan

– Michael Phillips

– Kalyan Vasudev

• Funding:

– ARL

– ONR

– Mitsubishi