Structured Models forDecision Making
Daphne KollerStanford University
MURI Program on Decision Making under UncertaintyJuly 18, 2000
MURI Kickoff Meeting 7/18/00
2
Roadmap
BayesNets
DBNs
FactoredMDPs
Static
Dynam
ic
Decisi
on
Problem
PRMs
DynamicPRMs
RelationalMDPs
EncapsulationReuse
EncapsulationApproximation
Factored Policy Iteration,Efficient PRM inference
MURI Kickoff Meeting 7/18/00
3
Outline Probabilistic Relational Models
– Representing complex domains– Structural uncertainty
• Temporal models
• Decision making
MURI Kickoff Meeting 7/18/00
4
Basic units of knowledge
entitiespropertiesrelations
attributes
MURI Kickoff Meeting 7/18/00
5
So what?• Set of entities and relations between them is
determined at BN design time– structure must be known in advance– hard to adapt to changes
• BNs for complex domains are large & unstructured very hard to build• No ability to generalize
– across “similar” individuals– across related situations
BNs are not suitable for representing complex,
structured, flexible domains.
MURI Kickoff Meeting 7/18/00
6
Probabilistic Relational Models
• Combine advantages of predicate logic & BNs: – natural domain modeling: objects, properties, relations;– generalization over a variety of situations;– compact, natural probability models.
• Integrate uncertainty with relational model:– properties of domain entities can depend on properties
of related entities;– uncertainty over relational structure of domain.
MURI Kickoff Meeting 7/18/00
7
Real-World Case Study
• Example object classes:
– Battalion– Battery– Vehicle– Location– Weather.
• Example relations:
– At-Location– Has-Weather– Sub-battery/In-
battalion– Sub-vehicle/In-battery
Battlefield situation assessment for missile units• several locations• many units• each has detailed model
MURI Kickoff Meeting 7/18/00
8
Under Fire
At-Location
#(Launcher.status = ok)
Next Mission
Scud Battery: Simplified PRM
LauncherStatus
Report
MURI Kickoff Meeting 7/18/00
9
SCUD Battery Model
MURI Kickoff Meeting 7/18/00
10
Cargo Vehicle Group
11
MURI Kickoff Meeting 7/18/00
Original BN*: SCUD Battery
Disadvantages• A lot more complex
– must include relevant attributes of related objects
• Hard to transfer information between different BN models
*Built by IET, Inc.
MURI Kickoff Meeting 7/18/00
12
Situation Models• Complex situations can be described compactly by
specifying objects and relations between them• Class model is instantiated for each object, with
probabilistic dependencies induced by relations
Angel Island Alcatraz
3rd Scud Battalion 17th Scud Battalion
Scud Battery 1 Scud Battery 2 Scud Battery 3
Launcher 1
MURI Kickoff Meeting 7/18/00
13
Example reasoning patternScud-Battalion-Charlie
Battery1
under_fire
Group-TLs
hit
#reported_damaged
damaged
rep_damaged
TL1
damaged
rep_damaged
TL2Loc
hide-support
hit
under_fire
#reported_damaged
hide-support
heavy
none
good
0.06 0.44 0.28 0.33
MURI Kickoff Meeting 7/18/00
14
Inference in PRMs
+Induces
BN overattributes
Under Fire
Attack
B1.Launch
B1.Success
B1.L1.Damaged
B1.L1.Report
B1.L2.Damaged
B1.L2.Report B2.Launch
B2.Success
B2.L1.Damaged
B2.L1.Report
B2.L2.Damaged
B2.L2.Report
Angel Island Alcatraz
3rd Scud Btn 17th Scud Btn
Scud Bty 1 Scud Bty 2 Scud Bty 3
Launcher 1
PRMSituationdescription
MURI Kickoff Meeting 7/18/00
15
Exploit Structure for Inference• Encapsulation: objects interact in limited ways
Inference can be encapsulated within objects, with “communication” limited to interfaces
• Reuse: objects from same class have same model Inference from one can be reused for others
MURI Kickoff Meeting 7/18/00
16
Effects of exploiting structure
0
1000
2000
3000
4000
5000
6000
1 2 3 4 5 6 7 8 9 10
flat BNno reuse
with reuse
#vehicles of each type / battery
runn
ing
time
in s
econ
ds
MURI Kickoff Meeting 7/18/00
17
Extension: Structural Uncertainty• Uncertainty about model structure:
– Set of objects: is that radar signal from a tank– Relations between objects: location of SCUD-Battalion-C
• Task 1: Seamless integration w. probabilistic model– structural variables can depend on other variables.
• Task 2: Efficient Inference– Use approximate inference to simplify model
• variational methods to summarize multiple potential influences• MCMC for traversing possible relationships
– Use structured inference (encapsulation/reuse) on simplified model
MURI Kickoff Meeting 7/18/00
18
Outline• Probabilistic Relational Models
Temporal models– Structured belief-state tracking– Dynamic PRMs: time, events and actions
• Decision making
MURI Kickoff Meeting 7/18/00
19
Dynamic Bayesian Nets
...Velocity(t+2)
Position(t+2)
Action(t+2)
Velocity(t+1)
Position(t+1)
Action(t+1)
Velocity(t)
Position(t)
Action(t)
Observed_pos(t) Observed_pos(t+1) Observed_pos(t+2)
))(|)((P
),|(P),|(P)|(P )()()()()()()()(
tStatetState
tttttttt VLLAVVAA1
111
• Compact representation of system dynamics– discrete, continuous, hybrid
• Generalization of Kalman filters
MURI Kickoff Meeting 7/18/00
20
Tracking System State
• In discrete/hybrid systems, belief state representation is exponential in # of state variables
• In hybrid systems, # of distinct hypotheses grows exponentially over time
TaskTask: Maintain : Maintain Belief stateBelief state — — distribution over distribution over current state given evidence so farcurrent state given evidence so far
...Velocity(t+2)
Position(t+2)
Action(t+2)
Velocity(t+1)
Position(t+1)
Action(t+1)
Velocity(t)
Position(t)
Action(t)
Observed_pos(t) Observed_pos(t+1)
MURI Kickoff Meeting 7/18/00
21
Approximate Tracking• Decompose belief state along “subsystem lines”
– Maintain belief state as product of marginals
• In hybrid systems, keep mixture of hypotheses for every subsystem– Merge hypotheses associated with similar density
H
X D
i
ii
TrueFalse
0.7
0.3
MURI Kickoff Meeting 7/18/00
22
Case Study: Diagnosis & Tracking for Five-Tank System
• State space per time slice– eleven-dimensional continuous space– 227 discrete failure modes
F1o F5oF23
observables
MURI Kickoff Meeting 7/18/00
23
The doomsday scenario
0
0.5
1
1.5
2
0 5 10 15 20 25 30 35 40 45 50
C12
C45
C23
Measurement errors: F23, F5o
Neg drift
Neg drift burst
burst
burst
MURI Kickoff Meeting 7/18/00
24
Algorithm Performance
0 5 10 15 20 25 30 35 40 45 500.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
C12
C45
P5
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
0 5 10 15 20 25 30 35 40 45 50
C12
C45P5
Omniscient Kalman Filter
MURI Kickoff Meeting 7/18/00
25
Dynamic PRMs• Goal:Goal: Model complex structured systems Model complex structured systems
– that evolve over timethat evolve over time– where agents take compound structured actionswhere agents take compound structured actions& construct effective scalable inference algorithm& construct effective scalable inference algorithm
• Easy part: Add time relation to PRMs– Allows notion of current and previous state– Maintains notions of structured objects and relations
• Challenges:– Appropriate representation for actions, events– Modeling changes in domain structure (objects, relations)– Effective inference that exploits structure
MURI Kickoff Meeting 7/18/00
26
Dynamic PRMs: Event Models
• Events can be triggered by external events– an agent’s actionor by system dynamics– e.g., a unit reaches its destination
• Events can influence the system structure– discrete change in continuous dynamics
• truck velocity goes to 0 when destination is reached– modification of relational structure
• aircraft taking off is no longer on aircraft carrier – creation / deletion of objects
• units entering/leaving battlespace
Events: Discrete points where the system undergoes a discontinuous change
MURI Kickoff Meeting 7/18/00
27
Dynamic PRMs: Adding Actions• Use relational / hierarchical action representation
– class hierarchy for Move action– an instantiation of a particular action is related to object
moving, road taken, origin, destination• Actions can depend on and influence attributes of
related objects– duration of Move action may depend on road condition,
influence status of moving objects• Actions are like events, can change domain
structure• Complex actions can be composed of simpler ones:
– Effects of complex action derived from that of subactions
MURI Kickoff Meeting 7/18/00
28
Inference in Dynamic Systems• Main tasks:
– situation monitoring– prediction
• Goal: Exploit structure as we did in PRMs
• First step: Encapsulation– Exploit structure of weakly interacting subsystems– Applied successfully to Dynamic Bayesian Nets
MURI Kickoff Meeting 7/18/00
29
Tracking in Dynamic PRMs• Use relational structure to guide belief state
approximation– direct dependencies only between related objects
• Deal with dynamic structure:– relations and even domain objects change over time– want to adjust our approximation to context– structural uncertainty critical
• Event-driven tracking– no reason to use fine-grained model of “boring bits”– but “fast forward” requires ability to propagate dynamics
over variable-length segments
MURI Kickoff Meeting 7/18/00
30
Outline• Probabilistic Relational Models
• Temporal models
Decision making– Planning in factored MDPs– Planning in relational MDPs
MURI Kickoff Meeting 7/18/00
31
What is a Markov Decision Process?
• An MDP is a controlled dynamic process• Stochastic transition between states• Actions affect system dynamics • Rewards or costs are associated with states
• Objective: Drive process to regions of high reward– MDP solutions are policies– Policies assign an action to every state
MURI Kickoff Meeting 7/18/00
32
MDP Policies & Value FunctionsSuppose an expert told you the “value” of each state:
V(s1) = 10 V(s2) = 5
s1
s2
Action 1
0.5
0.5
s1
s2
Action 2
0.7
0.3
MURI Kickoff Meeting 7/18/00
33
Greedy Policy Construction
Pick action with highest expected future value:
' )'(),|'()(maxarg)(sa sVassPsRs
Expectation overnext-state values
)(greedyV
MURI Kickoff Meeting 7/18/00
34
Bootstrapping: Policy Iteration
Guaranteed to find globally optimal policy ifV is defined over explicit states, i.e., if V is exponential
Guess VRepeat untilpolicy doesn’tchange
Idea: Greedy selection is useful even with suboptimal V
= greedy(V)V = value of acting on
Exploit Structure with Factored Policy Iteration
MURI Kickoff Meeting 7/18/00
35
Factored MDPs: DBNS + Rewards
X
Y
Z
t t+1
R1
Rewards have smallsets of parent variables too
Total reward addssub-rewards:R=R1+R2
R2
MURI Kickoff Meeting 7/18/00
36
Linearly Decomposable Value Functions
Approximate high-dimensional value functionwith combination of lower-dimensional functions
Motivation: Multi-attribute utility theory (Keeney & Raifa)
Note:Overlappingis allowed!
MURI Kickoff Meeting 7/18/00
37
Decomposable Value Functions
• Each basis function hi is the status of some small part(s) of a complex system– status of a machine– inventory of a store– status of a subgoal
wAshwsVi ii )()(~
Linear combination of restricted domain functions
MURI Kickoff Meeting 7/18/00
38
Exploiting Structure
Key operation: backprojection of a basis function thru a DBN transition
X
Y
Z
)(zfh 1)(yzfPh 1 Structure allows us to consider operations oversmall subsets of variables,not the entire state space.
x x
yzzyzyzy
x x
yzzyzyzy
MURI Kickoff Meeting 7/18/00
39
Policy FormatFactored value functions compact action effect descriptions
x x
yzzyzyzy
+8+12
x x
yzzyzyzy
+11+1
+4+7
Action 1 Action 2
Sorted result values form a decision list:If then action 1 else if then action 2 else if then action 1
xyz
x
MURI Kickoff Meeting 7/18/00
40
Factored Policy Iteration: Summary
Guess V = greedy(V)V = value of acting on
Structure inducesdecision-list policy
Key operations isomorphicto BN inference
• Time per iteration reduced from O((2n)3) to O(Cbk3)• Cb = cost of Bayes net inference (function of structure)• k = number of basis functions (k << 2n)
MURI Kickoff Meeting 7/18/00
41
Run Times
Note: Nearly optimal policy found in all cases ( 6).
0
10000
20000
30000
40000
50000
60000
70000
4 6 8 10 12 14 16
CP
U S
econ
ds/S
tate
s
State Variables
StatesSeconds
3n^3
MURI Kickoff Meeting 7/18/00
42
Planning in Relational MDPs• Replace DBN transition model with dynamic PRM• Generalize factored policy iteration
– Define basis functions via relational formulas:
– Replace BN inference with PRM inference as key step• Exploit hierarchical structure of complex actions by
encapsulating decision making along hierarchy
• Potential benefits:– Tractable approximate planning in relational domains– Unification of classical and stochastic planning
5- else 10, then base)closeto(x,)(tank: if xx
MURI Kickoff Meeting 7/18/00
43
Conclusions: Past & Present• PRMs compactly represent complex systems with
multiple interacting objects:– coherent (probabilistic) semantics;– structured representation: modularity & reuse.
• Scalable inference that exploits structure
• Tracking algorithms for DBNs that exploit system decomposition
• Planning algorithms in MDPs that exploit structure of system and of value functions
Theme: Representation & inference scale up,Theme: Representation & inference scale up, if we exploit structureif we exploit structure
MURI Kickoff Meeting 7/18/00
44
Conclusions: Future• Better inference for densely connected PRMs• Extending PRMs with time, events, actions• Exploit structure for inference in dynamic PRMs:
– system decomposition into subsystems– relational context– varying time granularity
• Planning in dynamic PRMs:– extend factored policy iteration to PRMs– exploit hierarchical action decomposition
MURI Kickoff Meeting 7/18/00
45
Acknowledgements• Students & postdocs
– Nir Friedman ( Hebrew U.)– Dirk Ormoneit– Ron Parr ( Duke)– Xavier Boyen– Urszula Chajewska– Lise Getoor– Carlos Guestrin– Uri Lerner– Uri Nodelman– Avi Pfeffer ( Harvard)– Eran Segal– Benjamin Taskar – Simon Tong – Brian Milch ( Berkeley)– Ken Takusagawa ( MIT)
• Support:– PECASE Award via ONR YIP– DARPA’s HPKB Program– MURI Program “Integrated
Approach to Intelligent Systems”– Sloan Faculty Fellowship– DARPA’s IA Program under
subcontract to SRI International– DARPA’s DMIF Program under
subcontract to IET Inc.– ONR grant
PhD
stu
dent
sP
ostd
ocs
Ugr
ad
http://robotics.stanford.edu/~koller/