33
Exact Mode Estimation for Exact Mode Estimation for POMDPs based on Constraint POMDPs based on Constraint Decomposition and Symbolic Decomposition and Symbolic Encoding Encoding Martin Sachenbacher July 1, 2003

Exact Mode Estimation for POMDPs based on Constraint Decomposition and Symbolic Encoding

  • Upload
    berny

  • View
    47

  • Download
    0

Embed Size (px)

DESCRIPTION

Exact Mode Estimation for POMDPs based on Constraint Decomposition and Symbolic Encoding. Martin Sachenbacher July 1, 2003. Exact vs. Approximate ME. Problems of ME with incomplete belief state Dead ends (no solutions) Incorrect leading solutions Incorrect probabilities of solutions - PowerPoint PPT Presentation

Citation preview

Page 1: Exact Mode Estimation for POMDPs based on Constraint Decomposition and Symbolic Encoding

Exact Mode Estimation for POMDPs Exact Mode Estimation for POMDPs based on Constraint Decomposition based on Constraint Decomposition and Symbolic Encodingand Symbolic Encoding

Martin SachenbacherJuly 1, 2003

Page 2: Exact Mode Estimation for POMDPs based on Constraint Decomposition and Symbolic Encoding

Exact vs. Approximate MEExact vs. Approximate ME

Problems of ME with incomplete belief state– Dead ends (no solutions)– Incorrect leading solutions– Incorrect probabilities of solutions

Usefulness of ME with complete belief state– As accuracy reference– As performance reference– As a starting point for approximations

Key: Compact representation of belief state– Map to semiring-based CSP– Decompose Hypergraph into Hypertree– Encode Tree Nodes symbolically as ADDs

Page 3: Exact Mode Estimation for POMDPs based on Constraint Decomposition and Symbolic Encoding

OutlineOutline

SCSPs (Semiring-based CSPs) Mapping State Constraints to SCSPs Mapping Transition Constraints to SCSPs ADDs (Algebraic Decision Diagrams) Hypertree Decompositions of SCSPs Solving Tree-structured SCSPs Exact Mode Estimation for POMDPs as

Decomposition/ADD-based SCSP Solving Demonstration: Two Switches Example

Page 4: Exact Mode Estimation for POMDPs based on Constraint Decomposition and Symbolic Encoding

SCSPs (Semiring-based CSPs)SCSPs (Semiring-based CSPs)

Generalization of CSPs [Bistarelli et al. 97] Domain D, Variables V, Set S, Type T V Constraints are mappings Dk S Operations (for join) and (for projection) on S (S, , , 0, 1) must for form c-semiring Dynamic Programming applicable to all SCSPs Examples

– ({0,1}, , , 0, 1): Classical CSPs– (R+, min, +, +, 0): Weighted CSPs– ([0,1], max, *, 0, 1): Probabilistic CSPs

Page 5: Exact Mode Estimation for POMDPs based on Constraint Decomposition and Symbolic Encoding

Encoding States as SCSPsEncoding States as SCSPs

Example: Or-Gate P(Or=ok) = 99%, P(Or=fty) = 1%

xt in1 in2 outok lo lo look lo hi hiok hi lo hiok hi hi hifty * * *

f

0.990.990.990.990.01

≥ 1

Or

Page 6: Exact Mode Estimation for POMDPs based on Constraint Decomposition and Symbolic Encoding

Encoding Observations as SCSPsEncoding Observations as SCSPs

Example: (Probabilistic) Observation

0 1 2 3

P

0.9

0.60.3

xi

xi f

0123

0.60.90.30.0

Distribution over values for xi

Page 7: Exact Mode Estimation for POMDPs based on Constraint Decomposition and Symbolic Encoding

Encoding Transitions as SCSPsEncoding Transitions as SCSPs

Example: (Probabilistic) CCA

0

1

0.9 0.9

0.9

0.9

xt cmd xt+1 f

0 off 00 on 00 off 10 on 11 off 01 on 01 off 11 on 1

0.90.10.10.90.90.10.10.9

cmd=offcmd=on

cmd=on

cmd=off

Transition Function

Page 8: Exact Mode Estimation for POMDPs based on Constraint Decomposition and Symbolic Encoding

ADDs: Symbolic (graph-based) representation of functions {0,1}n R

Generalization of BDDs (functions {0,1}n {0,1}) Canonicity of representation (as for BDDs) Efficient package: CUDD

Algebraic Decision DiagramsAlgebraic Decision Diagrams

A

B B

C C

0 1 2 3

Page 9: Exact Mode Estimation for POMDPs based on Constraint Decomposition and Symbolic Encoding

ADD Join OperationsADD Join Operations

Multiplication, addition, maximum, … Generalization of BDD operations

ABC f f*gg f>1f+g

000001010011100101110111

01121223

32010001

32131224

02020003

5*f

055105101015

00010111

max(f,g)

32121223

Page 10: Exact Mode Estimation for POMDPs based on Constraint Decomposition and Symbolic Encoding

ExampleExample

Summation of ADD f, ADD g

A

B B

C C C

3 2 1 0

A

B B

C C

0 1 2 3

A

B B

C C C

4 3 2 1

+ =

Page 11: Exact Mode Estimation for POMDPs based on Constraint Decomposition and Symbolic Encoding

ADD Projection OperationsADD Projection Operations

(f,X) (and (f,X)) obtained by summing (multiplying) values of tuples that differ only w.r.t. X

ABC f

000001010011100101110111

01121223

AB (f,{C})

00011011

1335

(f,{C})

0226

Page 12: Exact Mode Estimation for POMDPs based on Constraint Decomposition and Symbolic Encoding

ADD Projection OperationsADD Projection Operations

For optimization, we require operation max(f,X) that yields maximum value of tuples differing only w.r.t. X

ABC f

000001010011100101110111

01121223

AB (f,{C})

00011011

1335

(f,{C})

0226

Not part of CUDD, but easy to implement as variant of /(f,X).

max(f,{C})

1223

Page 13: Exact Mode Estimation for POMDPs based on Constraint Decomposition and Symbolic Encoding

Solving SCSPs using DecompositionSolving SCSPs using Decomposition

Transform SCSPs into Hypertree H=(T,,) Compute constraint (v) for each node v Bottom-up phase for computing values Top-down phase for extracting solutions

Page 14: Exact Mode Estimation for POMDPs based on Constraint Decomposition and Symbolic Encoding

Pseudocode for Bottom-Up PhasePseudocode for Bottom-Up Phase

Function solve(v)For Each child children(v)

(v) (v) max((child), (child) \ (v))

Next child

Return (v) Generalization of (Semi-)Join Operation

Page 15: Exact Mode Estimation for POMDPs based on Constraint Decomposition and Symbolic Encoding

ExampleExample

Boolean Polycell

And1

And2

F = 0

Or2

G = 1

Or1

Or3

X

Y

Z

B = 1

D = 1

A = 1

E = 0

C = 1

Page 16: Exact Mode Estimation for POMDPs based on Constraint Decomposition and Symbolic Encoding

ExampleExample

Hypertree Decomposition of Boolean Polycell

O3A1CEFXYZ

A2GYZ O1ACXO2BDY

Y,Z Y C,X

ok 1 1 1fty 1 1

1fty 1 0

1fty 1 1

0fty 1 0

0

ok ok 1 0 0 0 0 1ok ok 1 0 0 0 1 1ok ok 1 0 0 1 0 1

ok 1 1 1fty 1 1

1fty 1 1

0

ok 1 1 1fty 1 1

1fty 1 1

0

v0

v1 v2 v3

U=.98505

U=.99U=.99U=.995U=.005 U=.01 U=.01

Page 17: Exact Mode Estimation for POMDPs based on Constraint Decomposition and Symbolic Encoding

ExampleExample

Initial (v0)U=.98505

U=.00995

U=.00005U=.00495

fty ok 1 0 0 0 1 1fty ok 1 0 0 0 1 0fty ok 1 0 0 1 0 0fty ok 1 0 0 1 0 1fty ok 1 0 0 0 1 1fty ok 1 0 0 1 0 1

……

ok ok 1 0 0 0 0 1

ok ok 1 0 0 0 1 1

ok ok 1 0 0 1 0 1

ADD with20 nodes,5 leaves

O3A1CEFXYZv0

Page 18: Exact Mode Estimation for POMDPs based on Constraint Decomposition and Symbolic Encoding

ExampleExample

After multiplication with max((v1),{A2,G})

ok ok 1 0 0 0 1 1

U=.98012

U=.00990

U=.00492

U=2.4E-5

U=4.9E-5

U=2.5E-7

fty ok 1 0 0 0 1 1ok ok 1 0 0 0 0

1ok ok 1 0 0 1 0 1……

ADD with28 nodes,7 leaves

O3A1CEFXYZv0

Page 19: Exact Mode Estimation for POMDPs based on Constraint Decomposition and Symbolic Encoding

ExampleExample

After multiplication with max((v2),{O2,B,D})

ok ok 1 0 0 0 1 1

U=.97032

U=.00980

U=.00487

U=4.9E-7U=2.4E-7

U=2.5E-9

fty ok 1 0 0 0 1 1ok fty 1 0 0 0 1

1…

…U=4.9E-5

O3A1CEFXYZv0

ADD with30 nodes,8 leaves

Page 20: Exact Mode Estimation for POMDPs based on Constraint Decomposition and Symbolic Encoding

ExampleExample

After multiplication with max((v3),{O1,A})

ADD with35 nodes,10 leaves

ok ok 1 0 0 0 1 1

U=.00970

U=.00482

U=9.8E-5

U=4.9E-7

U=2.4E-7

U=4.9E-9

ok fty 1 0 0 1 1 1fty ok 1 0 0 0 1

1…

…U=4.8E-5

U=2.4E-9

U=2.5E-11

Best Solution:Umax = .0097

O3A1CEFXYZv0

Page 21: Exact Mode Estimation for POMDPs based on Constraint Decomposition and Symbolic Encoding

Pseudocode for Top-Down PhasePseudocode for Top-Down Phase

Function extractSolutions(vroot)E edges(vroot)

(vroot) max(, vars() \ decvars()vars(E))While E Do

e choose(E)v son-node(e)E (E \ e) edges(v)

0-1 (0)

div max(0-1 (v), vars())

( (v)) -1 div max(, vars() \ decvars()vars(E))

End While

“Divisor”

Restrict todecision and

shared variables

No search queue necessary

Page 22: Exact Mode Estimation for POMDPs based on Constraint Decomposition and Symbolic Encoding

ExampleExample

Initial = max((vroot),{E,F})

ok ok 1 0 1 1 U=.00970

U=.00482

U=9.8E-5

U=4.9E-7

U=2.4E-7

U=4.9E-9

ok fty 1 1 1 1

fty ok 1 0 1 1

…U=4.8E-5

U=2.4E-9

U=2.5E-11

O3A1CXYZ

ADD with21 tuples, 33 nodes, 10 leaves

Page 23: Exact Mode Estimation for POMDPs based on Constraint Decomposition and Symbolic Encoding

ExampleExample

After processing edge(v0,v3)

fty ok ok 1 1 U=.00970

U=.00482

U=9.8E-5

U=4.9E-7

U=2.4E-7

U=4.9E-9

ok ok fty 1 1

fty fty ok 1 1

…U=4.8E-5

U=2.4E-9

U=2.5E-11

O1O3A1YZ

ADD with21 tuples, 32 nodes, 10 leaves

Page 24: Exact Mode Estimation for POMDPs based on Constraint Decomposition and Symbolic Encoding

ExampleExample

After processing edge(v0,v2)

fty ok ok ok 1 1

U=.00970

U=.00482

U=9.8E-5

U=9.9E-7

U=4.9E-7

ok ok ok fty 1 1fty fty ok ok 1 1fty ok fty ok 1 1

…U=4.8E-5

U=2.5E-11

O1O2O3A1YZ

ADD with30 tuples, 47 nodes, 11 leaves

Page 25: Exact Mode Estimation for POMDPs based on Constraint Decomposition and Symbolic Encoding

ExampleExample

After processing edge(v0,v1)

fty ok ok ok ok U=.00970

U=.00482

U=9.8E-5

U=9.9E-7

ok ok ok fty okfty fty ok ok okfty ok fty ok ok

…U=4.8E-5

U=2.5E-11

O1O2O3A1A2

ADD with26 tuples,35 nodes, 12 leaves

U=2.4E-5#Solutions = 26

Easy to focus on leading solutions.

Page 26: Exact Mode Estimation for POMDPs based on Constraint Decomposition and Symbolic Encoding

Application: Exact ME for POMDPsApplication: Exact ME for POMDPs

Given: POMDP (Feasible States, Observables, Control Actions, Transitions), Observations

Approach: Complete representation of belief state (through decomposition and symbolic encoding)

Benefit: Allows for exploiting Markov property

S0

S1 …Sn

Time t

S0

S1 …Sn

Time t+1

Page 27: Exact Mode Estimation for POMDPs based on Constraint Decomposition and Symbolic Encoding

Algorithm: Exact ME for POMDPsAlgorithm: Exact ME for POMDPs

Construct Hypertree (offline) Construct State-ADDs for each node (offline) Construct Transition-ADDs for each node (offline) Repeat for each time step:

– Multiply nodes with Obs-ADDs (“Condition on Observations”)

– Establish consistency in the tree (Bottom-up)– Extract leading solution(s) from the tree (Top-down)

– Multiply nodes with Transition-ADDs, project on xt+1, set xt = xt+1, multiply with State-ADDs (“Transition Expansion”)

Complexity: Polynomial in width of Hypertree

Page 28: Exact Mode Estimation for POMDPs based on Constraint Decomposition and Symbolic Encoding

ExampleExample

Adapted from Jim Kurien’s thesis

t0: Sw1.cmd = on t1: Or.out = lo, Sw1.cmd = idl, Sw2.cmd = on t2: Or.out = lo

Sw1

≥ 1

Sw2

Or

hi

hi

Switches more likely to fail than Or-Gate

Page 29: Exact Mode Estimation for POMDPs based on Constraint Decomposition and Symbolic Encoding

ExampleExample

Switch Model

on

fty

0.95

1.0

t1 t2

0.05

lo lo lo hihi lohi hi

off

0.05

t1 t2lo lo hi hi

0.95

cmd=off

cmd=on

0.95

0.95

true

cmd=off,idlcmd=on,idl

Page 30: Exact Mode Estimation for POMDPs based on Constraint Decomposition and Symbolic Encoding

ExampleExample

Switch Model

xt t1 t2 on lo loon hi hioff * *fty * *

f

1.01.01.01.0

xt cmd xt+1 f

on on onon off offon idl onon * ftyoff on onoff off offoff idl offoff * ftyfty * fty

0.950.950.950.050.950.950.950.051.0

Page 31: Exact Mode Estimation for POMDPs based on Constraint Decomposition and Symbolic Encoding

ExampleExample

Or-Gate Model

ok

fty

0.99

1.0

in1 in2 out

true

0.01

lo lo lolo hi hihi lo hihi hi hi

xt in1 in2 outok lo lo look lo hi hiok hi lo hiok hi hi hifty * * *

xt xt+1

ok okok ftyfty fty

f

1.01.01.01.01.0

f

0.990.011.0

Page 32: Exact Mode Estimation for POMDPs based on Constraint Decomposition and Symbolic Encoding

ExampleExample

Initial belief state (chosen):– p(Sw=on) = p(Sw=off) = 0.475, p(Sw=fty) = 0.05– p(Or=ok) = 0.99, p(Or=fty) = 0.01

Observations/Commands:– t0: Sw1.cmd=on– t1: Or.out=lo, Sw1.cmd=idl, Sw2.cmd=on– t2: Or.out=lo

Leading Solutions:– t0: Sw1=on/off, Sw2=on/off, Or=ok– t1: Sw1=fty, Sw2=off, Or=ok– t2: Sw1=on, Sw2=on, Or=fty

Page 33: Exact Mode Estimation for POMDPs based on Constraint Decomposition and Symbolic Encoding

ConclusionConclusion

SCSPs elegant and general representation ADDs encoding of SCSPs efficient in average case,

exponential in the number of variables in worst case Decomposition factors problem into set of ADDs,

each confined to small numbers of variables The two methods complement each other well How far can we get with this combination?