70
Tractable Planning for Real- World Robotics: The promises and challenges of dealing with uncertainty Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

  • Upload
    rory

  • View
    30

  • Download
    0

Embed Size (px)

DESCRIPTION

Tractable Planning for Real-World Robotics: The promises and challenges of dealing with uncertainty. Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004. Robots in unstructured environments. A vision for robotic-assisted health-care. Providing - PowerPoint PPT Presentation

Citation preview

Page 1: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Tractable Planning for Real-World Robotics:

The promises and challenges of dealing with uncertainty

Joelle PineauRobotics Institute

Carnegie Mellon University

Stanford University

May 3, 2004

Page 2: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 2

Robots in unstructured environments

Page 3: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 3

A vision for robotic-assisted health-care

Moving thingsaround

Moving thingsaround Supporting

inter-personalcommunication

Supportinginter-personal

communication

Calling for helpin emergencies

Calling for helpin emergencies

Monitoring Rx adherence

& safety

Monitoring Rx adherence

& safety

Providinginformation Providing

information

Reminding to eat, drink, & take meds

Reminding to eat, drink, & take meds

Providing physical

assistance

Providing physical

assistance

Page 4: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 4

What is hard about planning for real-world robotics?

• Generating a plan that has high expected utility.

• Switching between different representations of the world.

• Resolving conflicts between interfering jobs.

• Accomplishing jobs in changing, partly unknown environments.

• Handling percepts which are incomplete, ambiguous, outdated,

incorrect.

* Highlights from a proposed robot grand challenge

Page 5: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 5

Talk outline

• Uncertainty in plan-based robotics

• Partially Observable Markov Decision Processes (POMDPs)

• POMDP solver #1: Point-based value iteration (PBVI)

• POMDP solver #2: Policy-contingent abstraction (PolCA+)

Page 6: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 6

Why use a POMDP?

• POMDPs provide a rich framework for sequential decision-making, which can model:

– Effect uncertainty

– State uncertainty

– Varying rewards across actions and goals

Page 7: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 7

Robotics:

• Robust mobile robot navigation[Simmons & Koenig, 1995; + many more]

• Autonomous helicopter control[Bagnell & Schneider, 2001; Ng et al., 2003]

• Machine vision[Bandera et al., 1996; Darrell & Pentland, 1996]

• High-level robot control[Pineau et al., 2003]

• Robust dialogue management[Roy, Pineau & Thrun, 2000; Peak & Horvitz, 2000]

POMDP applications in last decade

Others:

• Machine maintenance[Puterman., 1994]

• Network troubleshooting[Thiebeaux et al., 1996]

• Circuits testing [correspondence, 2004]

• Preference elicitation[Boutilier, 2002]

• Medical diagnosis[Hauskrecht, 1997]

Page 8: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 8

POMDP model

POMDP is n-tuple { S, A, Z, T, O, R }:

What goes on: st-1 st

at-1 at

S = state setA = action setZ = observation set

What we see: zt-1 zt

Page 9: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 9

POMDP model

POMDP is n-tuple { S, A, Z, T, O, R }:

What goes on: st-1 st

at-1 at

T: Pr(s’|s,a) = state-to-state transition probabilitiesS = state setA = action setZ = observation set

What we see: zt-1 zt

Page 10: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 10

POMDP model

POMDP is n-tuple { S, A, Z, T, O, R }:

What goes on: st-1 st

at-1 at

T: Pr(s’|s,a) = state-to-state transition probabilitiesO: Pr(z|s,a) = observation generation probabilities

S = state setA = action setZ = observation set

What we see: zt-1 zt

Page 11: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 11

POMDP model

POMDP is n-tuple { S, A, Z, T, O, R }:

What goes on: st-1 st

at-1 at

T: Pr(s’|s,a) = state-to-state transition probabilitiesO: Pr(z|s,a) = observation generation probabilitiesR(s,a) = reward function

S = state setA = action setZ = observation set

What we see: zt-1 zt

rt-1 rt

Page 12: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 12

POMDP model

POMDP is n-tuple { S, A, Z, T, O, R }:

What goes on: st-1 st

at-1 atWhat we see: zt-1 zt

What we infer: bt-1 bt

rt-1 rt

T: Pr(s’|s,a) = state-to-state transition probabilitiesO: Pr(z|s,a) = observation generation probabilitiesR(s,a) = reward function

S = state setA = action setZ = observation set

Page 13: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 13

Examples of robot beliefs

robot particles

Uniform belief Bi-modal belief

Page 14: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 14

Understanding the belief state

• A belief is a probability distribution over states

Where Dim(B) = |S|-1

– E.g. Let S={s1, s2}

P(s1)

0

1

Page 15: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 15

Understanding the belief state

• A belief is a probability distribution over states

Where Dim(B) = |S|-1

– E.g. Let S={s1, s2, s3}

P(s1)

P(s2)

0

1

1

Page 16: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 16

Understanding the belief state

• A belief is a probability distribution over states

Where Dim(B) = |S|-1

– E.g. Let S={s1, s2, s3 , s4}

P(s1)

P(s2)

0

1

1

P(s3)

Page 17: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 17

POMDP solving

Objective: Find the sequence of actions that maximizes the expected sum of rewards.

Bb

AabVbabTabRbV

'

)'()',,(),(max)(

Valuefunction

Immediatereward

Futurereward

Page 18: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 18

• Represent V(b) as the upper surface of a set of vectors.– Each vector is a piece of the control policy (= action sequences).

– Dim(vector) = number of states.

• Modify / add vectors to update value fn (i.e. refine policy).

POMDP value function

P(s1)

V(b)

b

2 states

Page 19: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 19

Optimal POMDP solving

• Simple problem: 2 states, 3 actions, 3 observations

V0(b)

b

Plan length #vectors 0 1

P(break-in)

Page 20: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 20

Optimal POMDP solving

• Simple problem: 2 states, 3 actions, 3 observations

P(break-in)

V1(b)

b

Plan length # vectors 0 1 1 3

Call-911

Investigate

Go-to-bed

Page 21: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 21

Optimal POMDP solving

• Simple problem: 2 states, 3 actions, 3 observations

V2(b)

b

Plan length # vectors 0 1 1 3 2 27

P(break-in)

Call-911

Investigate

Go-to-bed

Page 22: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 22

Optimal POMDP solving

• Simple problem: 2 states, 3 actions, 3 observations

V3(b)

b

Plan length # vectors 0 1 1 3 2 27 3 2187

P(break-in)

Call-911

Investigate

Go-to-bed

Page 23: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 23

Optimal POMDP solving

• Simple problem: 2 states, 3 actions, 3 observations

Plan length # vectors 0 1 1 3 2 27 3 2187 4 14,348,907V4(b)

b

P(break-in)

Call-911

Investigate

Go-to-bed

Page 24: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 24

The curse of history

)A( Z1 nn O

Policy size grows exponentially with the

planning horizon:

Where Γ = policy sizen = planning horizonA = # actionsZ = # observations

Page 25: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 25

How many vectors for this problem?

104 (navigation) x 103 (dialogue) states1000+ observations100+ actions

Page 26: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 26

Talk outline

• Uncertainty in plan-based robotics

• Partially Observable Markov Decision Processes (POMDPs)

• POMDP solver #1: Point-based value iteration (PBVI)

• POMDP solver #2: Policy-contingent abstraction (PolCA+)

Page 27: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 27

Exact solving assumes all beliefs are equally likely

robot particles

Uniform belief Bi-modal belief N-modal belief

INSIGHT: No sequence of actions and observations canproduce this N-modal belief.

Page 28: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 28

A new algorithm: Point-based value iteration

P(s1)

V(b)

b1 b0 b2

Approach:

Select a small set of belief points

Plan for those belief points only

Page 29: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 29

A new algorithm: Point-based value iteration

P(s1)

V(b)

b1 b0 b2a,z a,z

Approach:

Select a small set of belief points Use well-separated, reachable beliefs

Plan for those belief points only

Page 30: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 30

A new algorithm: Point-based value iteration

P(s1)

V(b)

b1 b0 b2a,z a,z

Approach:

Select a small set of belief points Use well-separated, reachable beliefs

Plan for those belief points only Learn value and its gradient

Page 31: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 31

A new algorithm: Point-based value iteration

Approach:

Select a small set of belief points Use well-separated, reachable beliefs

Plan for those belief points only Learn value and its gradient

Pick action that maximizes value: bbV

max)(

P(s1)

V(b)

bb1 b0 b2

Page 32: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 32

The anytime PBVI algorithm

• Alternate between:

1. Growing the set of belief point

2. Planning for those belief points

• Terminate when you run out of time or have a good policy.

Page 33: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 33

Complexity of value update

Exact Update PBVI

Time:Projection S2 A Z n S2 A Z n

Sum S A nZ S A Z n B

Size: (# vectors) A nZ B

where: S = # states n = # vectors at iteration n A = # actions B = # belief points

Z = # observations

Page 34: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 34

Theoretical properties of PBVI

Theorem: For any set of belief points B and planning horizon n, the error of the PBVI algorithm is bounded by:

P(s1)

V(b)

b1 b0 b2

Where is the set of reachable beliefsB is the set of all beliefs

1'2minmax* ||'||minmax

)1(

)(|||| bb

RRVV Bbbn

Bn

Err Err

Page 35: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 35

The anytime PBVI algorithm

• Alternate between:

1. Growing the set of belief point

2. Planning for those belief points

• Terminate when you run out of time or have a good policy.

Page 36: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 36

PBVI’s belief selection heuristic

1. Leverage insight from policy search methods:

– Focus on reachable beliefs.

P(s1)

b ba1,z2ba2,z2ba2,z1ba1,z1

a2,z2 a1,z2

a2,z1

a1,z1

Page 37: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 37

PBVI’s belief selection heuristic

1. Leverage insight from policy search methods:

– Focus on reachable beliefs.

2. Focus on high probability beliefs:

– Consider all actions, but stochastic observation choice.

P(s1)

b ba1,z2ba2,z1

a1,z2

a2,z1

ba2,z2ba1,z1

Page 38: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 38

PBVI’s belief selection heuristic

1. Leverage insight from policy search methods:

– Focus on reachable beliefs.

2. Focus on high probability beliefs:

– Consider all actions, but stochastic observation choice.

3. Use the error bound on point-based value updates:

– Select well-separated beliefs, rather than near-by beliefs.

P(s1)

b ba1,z2ba2,z1

a1,z2

a2,z1

Page 39: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 39

Classes of value function approximations

1. No belief [Littman et al., 1995]

3. Compressed belief[Poupart&Boutilier, 2002;

Roy&Gordon, 2002]

x1

x0

x2

2. Grid over belief[Lovejoy, 1991; Brafman 1997;

Hauskrecht, 2000; Zhou&Hansen, 2001]

4. Sample belief points[Poon, 2001; Pineau et al, 2003]

Page 40: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 40

Performance on well-known POMDPs

Maze1

0.20

0.94

0.00

2.30

2.25

REWARD

Maze2

0.11

-

0.07

0.35

0.34

Maze3

0.26

-

0.11

0.53

0.53

Maze1

0.19

-

24hrs

12166

3448

TIME

Maze2

1.44

-

24hrs

27898

360

Maze3

0.51

-

24hrs

450

288

Maze1

-

174

-

660

470

# Belief points

Maze2

-

337

-

1840

95

Maze3

-

-

-

300

86

Method

No belief[Littman&al., 1995]

Grid[Brafman., 1997]

Compressed[Poupart&al., 2003]

Sample[Poon, 2001]

PBVI[Pineau&al., 2003]

Maze1:36 states

Maze2:92 states

Maze3: 60 states

Page 41: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 41

PBVI in the Nursebot domain

Objective: Find the patient!

State space = RobotPosition PatientPosition

Observation space = RobotPosition + PatientFound

Action space = {North, South, East, West, Declare}

Page 42: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 42

PBVI performance on find-the-patient domain

Patient found 17% of trials

Patient found 90% of trialsNo Belief PBVI

No Belief

PBVI

Page 43: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 43

Validation of PBVI’s belief expansion heuristic

Greedy

PBVI

No BeliefRandom

Find-the-patient domain870 states, 5 actions, 30 observations

Page 44: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 44

Policy assuming full observability

You find some:(25%)

You loose some:(75%)

Page 45: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 45

PBVI Policy with 3141 belief points

You find some:(81%)

You loose some:(19%)

Page 46: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 46

PBVI Policy with 643 belief points

You find some:(22%)

You loose some:(78%)

Page 47: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 47

Highlights of the PBVI algorithm

• Algorithmic:

– New belief sampling algorithm.

– Efficient heuristic for belief point selection.

– Anytime performance.

• Experimental:

– Outperforms previous value approximation algorithms on known problems.

– Solves new larger problem (1 order of magnitude increase in problem size).

• Theoretical:

– Bounded approximation error.

[ Pineau, Gordon & Thrun, IJCAI 2003. Pineau, Gordon & Thrun, NIPS 2003. ]

Page 48: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 48

Back to the grand challenge

How can we go from 103 states to real-world problems?

Page 49: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 49

Talk outline

• Uncertainty in plan-based robotics

• Partially Observable Markov Decision Processes (POMDPs)

• POMDP solver #1: Point-based value iteration (PBVI)

• POMDP solver #2: Policy-contingent abstraction (PolCA+)

Page 50: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 50

Navigation

Structured POMDPs

Many real-world decision-making problems exhibit structure inherent to the problem domain.

Cognitive support Social interaction

High-level controller

Move AskWhere

Left Right Forward Backward

Page 51: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 51

Structured POMDP approaches

Factored models[Boutilier & Poole, 1996; Hansen & Feng, 2000; Guestrin et al., 2001]

– Idea: Represent state space with multi-valued state features.

– Insight: Independencies between state features can be leveraged to

overcome the curse of dimensionality.

Hierarchical POMDPs[Wiering & Schmidhuber, 1997; Theocharous et al., 2000; Hernandez-Gardiol &

Mahadevan, 2000; Pineau & Thrun, 2000]

– Idea: Exploit domain knowledge to divide one POMDP into many

smaller ones.

– Insight: Smaller action sets further help overcome the curse of history.

Page 52: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 52

A hierarchy of POMDPs

Act

ExamineHealth Navigate

MoveVerifyFluids

ClarifyGoal

North South East West

VerifyMeds

subtask

abstract action

primitive action

Page 53: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 53

PolCA+: Planning with a hierarchy of POMDPs

Navigate

Move ClarifyGoal

South East WestNorth

AMove = {N,S,E,W}

ACTIONSNorthSouthEastWest

ClarifyGoalVerifyFluidsVerifyMeds

ACTIONSNorthSouthEastWest

ClarifyGoalVerifyFluidsVerifyMeds

Step 1: Select the action set

Page 54: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 54

PolCA+: Planning with a hierarchy of POMDPs

Navigate

Move ClarifyGoal

South East WestNorth

AMove = {N,S,E,W}

SMove = {s1,s2}

STATE FEATURESX-positionY-position

X-goalY-goal

HealthStatus

STATE FEATURESX-positionY-position

X-goalY-goal

HealthStatus

ACTIONSNorthSouthEastWest

ClarifyGoalVerifyFluidsVerifyMeds

ACTIONSNorthSouthEastWest

ClarifyGoalVerifyFluidsVerifyMeds

Step 1: Select the action set

Step 2: Minimize the state set

Page 55: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 55

PolCA+: Planning with a hierarchy of POMDPs

Navigate

Move ClarifyGoal

South East WestNorth

AMove = {N,S,E,W}

SMove = {s1,s2}

STATE FEATURESX-positionY-position

X-goalY-goal

HealthStatus

STATE FEATURESX-positionY-position

X-goalY-goal

HealthStatus

ACTIONSNorthSouthEastWest

ClarifyGoalVerifyFluidsVerifyMeds

ACTIONSNorthSouthEastWest

ClarifyGoalVerifyFluidsVerifyMeds

PARAMETERS

{bh,Th,Oh,Rh}

PARAMETERS

{bh,Th,Oh,Rh}

Step 1: Select the action set

Step 2: Minimize the state set

Step 3: Choose parameters

Page 56: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 56

PolCA+: Planning with a hierarchy of POMDPs

Navigate

Move ClarifyGoal

South East WestNorth

STATE FEATURESX-positionY-position

X-goalY-goal

HealthStatus

STATE FEATURESX-positionY-position

X-goalY-goal

HealthStatus

ACTIONSNorthSouthEastWest

ClarifyGoalVerifyFluidsVerifyMeds

ACTIONSNorthSouthEastWest

ClarifyGoalVerifyFluidsVerifyMeds

PLAN

h

PLAN

h

PARAMETERS

{bh,Th,Oh,Rh}

PARAMETERS

{bh,Th,Oh,Rh}

Step 1: Select the action set

Step 2: Minimize the state set

Step 3: Choose parameters

Step 4: Plan task h

AMove = {N,S,E,W}

SMove = {s1,s2}

Page 57: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 57

PolCA+ in the Nursebot domain

• Goal: A robot is deployed in a nursing home, where it provides reminders to elderly users and accompanies them to appointments.

Page 58: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 58

Performance measure

-2000

2000

6000

10000

14000

0 400 800 1200

Time Steps

Cum

ulat

ive

Rew

ard

PolCA+

PolCA

QMDP

Hierarchy + Belief

Execution Steps

Hierarchy + Belief

Hierarchy + BeliefPolCA+

Page 59: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 59

Comparing user performance

0.1 0.10.18

POMDP PolicyNo Belief Policy

Page 60: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 60

Visit to the nursing home

Page 61: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 61

Highlights of the PolCA+ algorithm

• Algorithmic:– New hierarchical approach for POMDP framework.

– POMDP-specific state and observation abstraction methods.

• Experimental:– First instance of POMDP-based high-level robot controller.

– Novel application of POMDPs to robust dialogue management.

• Theoretical:– For special case (fully observable), guarantees recursive optimality.

[ Pineau, Gordon & Thrun, UAI 2003. Pineau et al., RAS 2003. Roy, Pineau & Thrun, ACL 2001]

Page 62: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 62

Future work

PBVI:

• How can we handle domains with multi-valued state features?

• Can we leverage dimensionality reduction?

• Can we find better ways to pick belief points?

PolCA+:

• Can we automatically learn hierarchies?

• How can we learn (or do without) pseudo-reward functions?

Page 63: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 63

Questions?

Project information:www.cs.cmu.edu/~nursebot

Navigation software:www.cs.cmu.edu/~carmen

Papers and more:www.cs.cmu.edu/~jpineau

Collaborators: Geoffrey Gordon, Judith Matthews, Michael Montemerlo,Martha Pollack, Nicholas Roy, Sebastian Thrunh

Page 64: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 64

Two types of uncertainty

Effect → Stochastic action effects

State → Partial and noisy sensor information

Page 65: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 65

Example: Effect uncertainty

Startposition

Distribution over possiblenext-step positions

Startposition

Distribution over possiblenext-step positions

Motion action

Page 66: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 66

Two types of uncertainty

Effect → Stochastic action effects

State → Partial and noisy sensor information

Page 67: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 67

Example: State uncertainty

Effect → Stochastic action effects

State → Partial and noisy sensor information

Model → Inaccurate parameterization of the environment

Agent → Unknown behaviour of other agents

Startposition

Distribution over possiblenext-step positions

Page 68: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 68

Validation of PBVI’s belief expansion heuristic

No Belief

Hallway domain60 states, 5 actions, 20 observations

Page 69: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 69

0

500

1000

1500

2000

2500

3000

3500

4000

4500

NoAbs PolCA PolCA+

# S

tate

ssubInform

subMove

subContact

subRest

subAssist

subRemind

act

State space reduction

No hierarchy PolCA+

Page 70: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 70

Future directions

• Improving POMDP planning

– sparser belief space sampling, ordered value updating, dimensionality reduction, continuous / hybrid domains

• Addressing two more types of uncertainty:1. Effect2. State3. Model4. Agent

• Exploring new applications