Download ppt - Structured Models for Decision Making

Structured Models forDecision Making

Daphne KollerStanford University

[email protected]

MURI Program on Decision Making under UncertaintyJuly 18, 2000

MURI Kickoff Meeting 7/18/00

2

Roadmap

BayesNets

DBNs

FactoredMDPs

Static

Dynam

ic

Decisi

on

Problem

PRMs

DynamicPRMs

RelationalMDPs

EncapsulationReuse

EncapsulationApproximation

Factored Policy Iteration,Efficient PRM inference


3

Outline Probabilistic Relational Models

– Representing complex domains– Structural uncertainty

• Temporal models

• Decision making


4

Basic units of knowledge

entitiespropertiesrelations

attributes


5

So what?• Set of entities and relations between them is

determined at BN design time– structure must be known in advance– hard to adapt to changes

• BNs for complex domains are large & unstructured very hard to build• No ability to generalize

– across “similar” individuals– across related situations

BNs are not suitable for representing complex,

structured, flexible domains.


6

Probabilistic Relational Models

• Combine advantages of predicate logic & BNs: – natural domain modeling: objects, properties, relations;– generalization over a variety of situations;– compact, natural probability models.

• Integrate uncertainty with relational model:– properties of domain entities can depend on properties

of related entities;– uncertainty over relational structure of domain.


7

Real-World Case Study

• Example object classes:

– Battalion– Battery– Vehicle– Location– Weather.

• Example relations:

– At-Location– Has-Weather– Sub-battery/In-

battalion– Sub-vehicle/In-battery

Battlefield situation assessment for missile units• several locations• many units• each has detailed model


8

Under Fire

At-Location

#(Launcher.status = ok)

Next Mission

Scud Battery: Simplified PRM

LauncherStatus

Report


9

SCUD Battery Model


10

Cargo Vehicle Group

11


Original BN*: SCUD Battery

Disadvantages• A lot more complex

– must include relevant attributes of related objects

• Hard to transfer information between different BN models

*Built by IET, Inc.


12

Situation Models• Complex situations can be described compactly by

specifying objects and relations between them• Class model is instantiated for each object, with

probabilistic dependencies induced by relations

Angel Island Alcatraz

3rd Scud Battalion 17th Scud Battalion

Scud Battery 1 Scud Battery 2 Scud Battery 3

Launcher 1


13

Example reasoning patternScud-Battalion-Charlie

Battery1

under_fire

Group-TLs

hit

#reported_damaged

damaged

rep_damaged

TL1

damaged

rep_damaged

TL2Loc

hide-support

hit

under_fire

#reported_damaged

hide-support

heavy

none

good

0.06 0.44 0.28 0.33


14

Inference in PRMs

+Induces

BN overattributes

Under Fire

Attack

B1.Launch

B1.Success

B1.L1.Damaged

B1.L1.Report

B1.L2.Damaged

B1.L2.Report B2.Launch

B2.Success

B2.L1.Damaged

B2.L1.Report

B2.L2.Damaged

B2.L2.Report

Angel Island Alcatraz

3rd Scud Btn 17th Scud Btn

Scud Bty 1 Scud Bty 2 Scud Bty 3

Launcher 1

PRMSituationdescription


15

Exploit Structure for Inference• Encapsulation: objects interact in limited ways

Inference can be encapsulated within objects, with “communication” limited to interfaces

• Reuse: objects from same class have same model Inference from one can be reused for others


16

Effects of exploiting structure

0

1000

2000

3000

4000

5000

6000

1 2 3 4 5 6 7 8 9 10

flat BNno reuse

with reuse

#vehicles of each type / battery

runn

ing

time

in s

econ

ds


17

Extension: Structural Uncertainty• Uncertainty about model structure:

– Set of objects: is that radar signal from a tank– Relations between objects: location of SCUD-Battalion-C

• Task 1: Seamless integration w. probabilistic model– structural variables can depend on other variables.

• Task 2: Efficient Inference– Use approximate inference to simplify model

• variational methods to summarize multiple potential influences• MCMC for traversing possible relationships

– Use structured inference (encapsulation/reuse) on simplified model


18

Outline• Probabilistic Relational Models

Temporal models– Structured belief-state tracking– Dynamic PRMs: time, events and actions

• Decision making


19

Dynamic Bayesian Nets

...Velocity(t+2)

Position(t+2)

Action(t+2)

Velocity(t+1)

Position(t+1)

Action(t+1)

Velocity(t)

Position(t)

Action(t)

Observed_pos(t) Observed_pos(t+1) Observed_pos(t+2)

))(|)((P

),|(P),|(P)|(P )()()()()()()()(

tStatetState

tttttttt VLLAVVAA1

111

• Compact representation of system dynamics– discrete, continuous, hybrid

• Generalization of Kalman filters


20

Tracking System State

• In discrete/hybrid systems, belief state representation is exponential in # of state variables

• In hybrid systems, # of distinct hypotheses grows exponentially over time

TaskTask: Maintain : Maintain Belief stateBelief state — — distribution over distribution over current state given evidence so farcurrent state given evidence so far

...Velocity(t+2)

Position(t+2)

Action(t+2)

Velocity(t+1)

Position(t+1)

Action(t+1)

Velocity(t)

Position(t)

Action(t)

Observed_pos(t) Observed_pos(t+1)


21

Approximate Tracking• Decompose belief state along “subsystem lines”

– Maintain belief state as product of marginals

• In hybrid systems, keep mixture of hypotheses for every subsystem– Merge hypotheses associated with similar density

H

X D

i

ii

TrueFalse

0.7

0.3


22

Case Study: Diagnosis & Tracking for Five-Tank System

• State space per time slice– eleven-dimensional continuous space– 227 discrete failure modes

F1o F5oF23

observables


23

The doomsday scenario

0

0.5

1

1.5

2

0 5 10 15 20 25 30 35 40 45 50

C12

C45

C23

Measurement errors: F23, F5o

Neg drift

Neg drift burst

burst

burst


24

Algorithm Performance

0 5 10 15 20 25 30 35 40 45 500.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

C12

C45

P5

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

0 5 10 15 20 25 30 35 40 45 50

C12

C45P5

Omniscient Kalman Filter


25

Dynamic PRMs• Goal:Goal: Model complex structured systems Model complex structured systems

– that evolve over timethat evolve over time– where agents take compound structured actionswhere agents take compound structured actions& construct effective scalable inference algorithm& construct effective scalable inference algorithm

• Easy part: Add time relation to PRMs– Allows notion of current and previous state– Maintains notions of structured objects and relations

• Challenges:– Appropriate representation for actions, events– Modeling changes in domain structure (objects, relations)– Effective inference that exploits structure


26

Dynamic PRMs: Event Models

• Events can be triggered by external events– an agent’s actionor by system dynamics– e.g., a unit reaches its destination

• Events can influence the system structure– discrete change in continuous dynamics

• truck velocity goes to 0 when destination is reached– modification of relational structure

• aircraft taking off is no longer on aircraft carrier – creation / deletion of objects

• units entering/leaving battlespace

Events: Discrete points where the system undergoes a discontinuous change


27

Dynamic PRMs: Adding Actions• Use relational / hierarchical action representation

– class hierarchy for Move action– an instantiation of a particular action is related to object

moving, road taken, origin, destination• Actions can depend on and influence attributes of

related objects– duration of Move action may depend on road condition,

influence status of moving objects• Actions are like events, can change domain

structure• Complex actions can be composed of simpler ones:

– Effects of complex action derived from that of subactions


28

Inference in Dynamic Systems• Main tasks:

– situation monitoring– prediction

• Goal: Exploit structure as we did in PRMs

• First step: Encapsulation– Exploit structure of weakly interacting subsystems– Applied successfully to Dynamic Bayesian Nets


29

Tracking in Dynamic PRMs• Use relational structure to guide belief state

approximation– direct dependencies only between related objects

• Deal with dynamic structure:– relations and even domain objects change over time– want to adjust our approximation to context– structural uncertainty critical

• Event-driven tracking– no reason to use fine-grained model of “boring bits”– but “fast forward” requires ability to propagate dynamics

over variable-length segments


30

Outline• Probabilistic Relational Models

• Temporal models

Decision making– Planning in factored MDPs– Planning in relational MDPs


31

What is a Markov Decision Process?

• An MDP is a controlled dynamic process• Stochastic transition between states• Actions affect system dynamics • Rewards or costs are associated with states

• Objective: Drive process to regions of high reward– MDP solutions are policies– Policies assign an action to every state


32

MDP Policies & Value FunctionsSuppose an expert told you the “value” of each state:

V(s1) = 10 V(s2) = 5

s1

s2

Action 1

0.5

0.5

s1

s2

Action 2

0.7

0.3


33

Greedy Policy Construction

Pick action with highest expected future value:

' )'(),|'()(maxarg)(sa sVassPsRs

Expectation overnext-state values

)(greedyV


34

Bootstrapping: Policy Iteration

Guaranteed to find globally optimal policy ifV is defined over explicit states, i.e., if V is exponential

Guess VRepeat untilpolicy doesn’tchange

Idea: Greedy selection is useful even with suboptimal V

= greedy(V)V = value of acting on

Exploit Structure with Factored Policy Iteration


35

Factored MDPs: DBNS + Rewards

X

Y

Z

t t+1

R1

Rewards have smallsets of parent variables too

Total reward addssub-rewards:R=R1+R2

R2


36

Linearly Decomposable Value Functions

Approximate high-dimensional value functionwith combination of lower-dimensional functions

Motivation: Multi-attribute utility theory (Keeney & Raifa)

Note:Overlappingis allowed!


37

Decomposable Value Functions

• Each basis function hi is the status of some small part(s) of a complex system– status of a machine– inventory of a store– status of a subgoal

wAshwsVi ii )()(~

Linear combination of restricted domain functions


38

Exploiting Structure

Key operation: backprojection of a basis function thru a DBN transition

X

Y

Z

)(zfh 1)(yzfPh 1 Structure allows us to consider operations oversmall subsets of variables,not the entire state space.

x x

yzzyzyzy

x x

yzzyzyzy


39

Policy FormatFactored value functions compact action effect descriptions

x x

yzzyzyzy

+8+12

x x

yzzyzyzy

+11+1

+4+7

Action 1 Action 2

Sorted result values form a decision list:If then action 1 else if then action 2 else if then action 1

xyz

x


40

Factored Policy Iteration: Summary

Guess V = greedy(V)V = value of acting on

Structure inducesdecision-list policy

Key operations isomorphicto BN inference

• Time per iteration reduced from O((2n)3) to O(Cbk3)• Cb = cost of Bayes net inference (function of structure)• k = number of basis functions (k << 2n)


41

Run Times

Note: Nearly optimal policy found in all cases ( 6).

0

10000

20000

30000

40000

50000

60000

70000

4 6 8 10 12 14 16

CP

U S

econ

ds/S

tate

s

State Variables

StatesSeconds

3n^3


42

Planning in Relational MDPs• Replace DBN transition model with dynamic PRM• Generalize factored policy iteration

– Define basis functions via relational formulas:

– Replace BN inference with PRM inference as key step• Exploit hierarchical structure of complex actions by

encapsulating decision making along hierarchy

• Potential benefits:– Tractable approximate planning in relational domains– Unification of classical and stochastic planning

5- else 10, then base)closeto(x,)(tank: if xx


43

Conclusions: Past & Present• PRMs compactly represent complex systems with

multiple interacting objects:– coherent (probabilistic) semantics;– structured representation: modularity & reuse.

• Scalable inference that exploits structure

• Tracking algorithms for DBNs that exploit system decomposition

• Planning algorithms in MDPs that exploit structure of system and of value functions

Theme: Representation & inference scale up,Theme: Representation & inference scale up, if we exploit structureif we exploit structure


44

Conclusions: Future• Better inference for densely connected PRMs• Extending PRMs with time, events, actions• Exploit structure for inference in dynamic PRMs:

– system decomposition into subsystems– relational context– varying time granularity

• Planning in dynamic PRMs:– extend factored policy iteration to PRMs– exploit hierarchical action decomposition


45

Acknowledgements• Students & postdocs

– Nir Friedman ( Hebrew U.)– Dirk Ormoneit– Ron Parr ( Duke)– Xavier Boyen– Urszula Chajewska– Lise Getoor– Carlos Guestrin– Uri Lerner– Uri Nodelman– Avi Pfeffer ( Harvard)– Eran Segal– Benjamin Taskar – Simon Tong – Brian Milch ( Berkeley)– Ken Takusagawa ( MIT)

• Support:– PECASE Award via ONR YIP– DARPA’s HPKB Program– MURI Program “Integrated

Approach to Intelligent Systems”– Sloan Faculty Fellowship– DARPA’s IA Program under

subcontract to SRI International– DARPA’s DMIF Program under

subcontract to IET Inc.– ONR grant

PhD

stu

dent

sP

ostd

ocs

Ugr

ad

http://robotics.stanford.edu/~koller/