Upload
tristian-weathersby
View
219
Download
0
Embed Size (px)
Citation preview
An LP-Based Heuristic for Optimal Planning
Menkes van den BrielDepartment of Industrial Engineering
Arizona State [email protected]
Subbarao KambhampatiDepartment of Computer Science
Arizona State [email protected]
Thomas VossenLeeds School of Business
University of Colorado at [email protected]
J. BentonDepartment of Computer Science
Arizona State [email protected]
http://rakaposhi.eas.asu.edu/yochan/
What is automated planning?
loc1 loc2 loc1 loc2
loc1 loc1
Initial states0 S
Goals* S
Action
a = pre, post, prevail
What is automated planning?
loc1 loc2 loc1 loc2
loc1 loc1
Initial states0 S
Goals* S
Action
a = pre, post, prevail
PlanP = a1, …, an
Motivation
• Why heuristics?– Heuristic state space search have been very successful in
solving automated planning problems
• Why optimal planning?– Real-world planning applications require optimal or near-optimal
solutions• The difference between a (near) optimal solution and a feasible
solution may be the difference between winning or losing the interest of an investor or strategic partner
LP-based heuristic
Relax the ordering of the actions
Setup an integer programming formulation
Solve the LP-relaxation and use the objective function value as an admissible distance estimate
Strengthen the formulation by adding valid inequalites
Action selection formulation
• Represent the planning problem as a set of loosely coupled network flow problems– Each state variable defines one network flow problem– Nodes correspond to the state variable values– Arcs correspond to state variable transitions
Simple logistics example
1
2
T
1
2
DTGPackage1
DTGTruck1
Load(p1,t1,l1)
Load(p1,t1,l2)
Unload(p1,t1,l1)
Unload(p1,t1,l2)
Drive(l1,l2) Drive(l2,l1)
Load(p1,t1,l1)Unload(p1,t1,l1)
Load(p1,t1,l1)Unload(p1,t1,l1)
loc1 loc2
Action selection formulation
• Variables– xa Z+, for a A; xa is equal to the number of times action a is
executed
• Objective function– MIN aA xa
• Constraints, for all c C, f Vc
eVc+(f):aAcE(e) xa – eVc–(f):bAcE(e) xb
– xa M eVc+(f):bAcE(e) xb for all f s0[c], a AcV(f)
1 if f s0[c], f = s*[c]–1 if f = s0[c], f s*[c]0 otherwise
No time indicesNo upper bound
Simple logistics example
1
2
T
1
2
DTGPackage1
DTGTruck1
Load(p1,t1,l1)
Load(p1,t1,l2)
Unload(p1,t1,l1)
Unload(p1,t1,l2)
Drive(l1,l2) Drive(l2,l1)
Load(p1,t1,l1)Unload(p1,t1,l1)
Load(p1,t1,l1)Unload(p1,t1,l1)
loc1 loc2
Simple logistics example
Feasible plan
xDrive(l2,l1) = 1xLoad(p1,t1,l1) = 1xDrive(l1,l2) = 1xUnload(p1,t1,l2) = 11
2
T
1
2
DTGPackage1
DTGTruck1
Load(p1,t1,l1)
Load(p1,t1,l2)
Unload(p1,t1,l1)
Unload(p1,t1,l2)
Drive(l1,l2) Drive(l2,l1)
Load(p1,t1,l1)Unload(p1,t1,l1)
Load(p1,t1,l1)Unload(p1,t1,l1)
4
Drive(l2,l1) Load(p1,t1,l1) Drive(l1,l2) Unload(p1,t1,l2)
Simple logistics example
LP solution
xLoad(p1,t1,l1) = 1xUnload(p1,t1,l2) = 1xDrive(l2,l1) = 1/M
1
2
T
1
2
DTGPackage1
DTGTruck1
Load(p1,t1,l1)
Load(p1,t1,l2)
Unload(p1,t1,l1)
Unload(p1,t1,l2)
Drive(l1,l2) Drive(l2,l1)
Load(p1,t1,l1)Unload(p1,t1,l1)
Load(p1,t1,l1)Unload(p1,t1,l1)
2 + 1/M
Drive(l2,l1) Load(p1,t1,l1) Unload(p1,t1,l2)… …
Preliminary resultsProblem LP LP- Lplan h+ hFF Optimallog4-0 16.0* 17 19 19 20log4-1 14.0* 15 17 17 19log4-2 10.0* 11 13 13 15log5-1 12.0* 13 15 15 17log5-2 6.0* 7 8 8 8log6-1 10.0* 11 13 13 14log6-9 18.0* 19 21 21 24log12-0 32.0* 33 39 39 -log15-1 54.0* - 63 66 -freecell2-1 9 9 9 9 9freecell2-2 8 8 8 8 8freecell2-3 8 8 8 9 8freecell2-4 8 8 8 9 8freecell2-5 9 9 9 9 9freecell3-5 12 13 13 14 -freecell13-3 55 - - 95 -freecell13-4 54 - - 94 -freecell13-5 52 - - 94 -driverlog1 3.0* 7 6 8 7driverlog2 12.0* 13 14 15 19driverlog3 8.0* 9 11 11 12driverlog4 11.0* 12 12 15 16driverlog6 8.0* 9 10 10 11driverlog7 11.0* 12 12 15 13driverlog13 15.0* 16 21 26 -driverlog19 60.0* - 89 93 -driverlog20 60.0* - 84 106 -
Preliminary resultsProblem LP LP- Lplan h+ hFF Optimalzenotravel1 1 1 1 1 1zenotravel2 3.0* 5 4 4 6zenotravel3 4.0* 5 5 5 6zenotravel4 5.0* 6 6 6 8zenotravel5 8.0* 9 11 11 11zenotravel6 8.0* 9 11 13 11zenotravel13 18.0* 19 23 23 -zenotravel19 46.0* - 62 63 -zenotravel20 50.0* - - 69 -tpp1 3.0* 5 4 4 5tpp2 6.0* 7 7 7 8tpp3 9.0* 10 10 10 11tpp4 12.0* 13 13 13 14tpp5 15.0* 17 17 17 19tpp6 21.0* 23 21 21 -tpp28 150.0* - - 88 -tpp29 - - - 104 -tpp30 174.0* - - 101 -bw-sussman 4 6 5 5 6bw-12step 4 8 4 7 12bw-large-a 12 12 12 12 12bw-large-b 16 18 16 16 18
Strengthening techniques
• Composition of state variables (i.e. fluent merging)– Given the domain transition graph (DTG) of two state variables
c1, c2, the composition of DTGc1 and DTGc2 is the domain transition graph DTGc1||c2 = (Vc1||c2, Ec1||c2) where
– Vc1||c2 = Vc1 Vc2
– ((f1,g1),(f2,g2)) Ec1||c2 if f1,f2 Vc1, g1,g2 Vc2 and there exists an action a A such that one of the following conditions hold
• pre[c1] = f1, post[c1] = f2, and pre[c2] = g1, post[c2] = g2
• pre[c1] = f1, post[c1] = f2, and prevail[c2] = g1, g1 = g2
• pre[c1] = f1, post[c1] = f2, and g1= g2
The term composition is also used in model checking to define the parallel composition or the synchronized product of automata
[Cassandras & Lafortune, 1999]
Example
• Two DTGs and their composition
f3
f2
f1
g2
g1
b
c
d
DTGc1 DTGc2
a
b
f1,g2
f2,g1
f2,g2
f3,g1
f3,g2
f1,,g1
DTGc1 || c2
a
a
b
c
c
d
d
Example
• Two DTGs and their composition– Small in-arcs denote the initial state– Double circles denote the goal
f3
f2
f1
g2
g1
b
c
d
DTGc1 DTGc2
a
b
f1,g2
f2,g1
f2,g2
f3,g1
f1,,g1
DTGc1 || c2
a
a
b
c
c
d
d
Simple logistics example
loc1 loc2
1,1
1,T
2,T
2,2
1,2
2,1
DTGTruck1 || Package1
Drive(l1,l2)
Drive(l2,l1)
Load(p1,t1,l1)
Load(p1,t1,l2)
Unload(p1,t1,l1)
Unload(p1,t1,l2)
Drive(l1,l2)
Drive(l2,l1)
Drive(l1,l2)Drive(l2,l1)
Simple logistics example
1,1
1,T
2,T
2,2
1,2
2,1
DTGTruck1 || Package1
LP solution
xDrive(l2,l1) = 1xLoad(p1,t1,l1) = 1xDrive(l1,l2) = 1xUnload(p1,t1,l2) = 1
4
Drive(l2,l1) Load(p1,t1,l1) Drive(l1,l2) Unload(p1,t1,l2)
Drive(l1,l2)
Drive(l2,l1)
Load(p1,t1,l2)
Unload(p1,t1,l1)
Unload(p1,t1,l2)
Drive(l1,l2)
Drive(l2,l1)
Drive(l1,l2)Drive(l2,l1)
Another example
• Two DTGs and their composition
f3
f2
f1
g3
g2
g1
f1,g2
f1,g3
f2,g1
f2,g2f2,g3
f3,g1
f3,g2
f3,g3
f1,,g1
DTGc1 DTGc2 DTGc1 || c2
Another example
• Two DTGs and their composition– Solution to the individual state variables
f3
f2
f1
g3
g2
g1
f1,g2
f1,g3
f2,g1
f2,g2f2,g3
f3,g1
f3,g2
f3,g3
f1,,g1
b
a
a
b
DTGc1 DTGc2 DTGc1 || c2
Another example
• Two DTGs and their composition– Solution to the individual state variables represented in the
composed state variable
f3
f2
f1
g3
g2
g1
f1,g2
f1,g3
f2,g1
f2,g2f2,g3
f3,g1
f3,g2
f3,g3
f1,,g1
b
a
a
b
DTGc1 DTGc2 DTGc1 || c2
b
a
Another example
• Two DTGs and their composition– Solution to the individual state variables represented in the
composed state variable
f3
f2
f1
g3
g2
g1
f1,g2
f1,g3
f2,g1
f2,g2f2,g3
f3,g1
f3,g2
f3,g3
f1,,g1
b
a
a
b
DTGc1 DTGc2 DTGc1 || c2
b
a
Violates balance of flow constraints
Another example
• Two DTGs and their composition– Adding new balance of flow constraints strengthens the
formulation
f3
f2
f1
g3
g2
g1
f1,g2
f1,g3
f2,g1
f2,g2f2,g3
f3,g1
f3,g2
f3,g3
f1,,g1
b
a
a
b
DTGc1 DTGc2 DTGc1 || c2
b
a
c
c
e
dd
e
Identifying mergeable fluents
• When should we create a composition of two or more state variables?– Look at the causal graph– Look at the actions that introduce dependencies in the causal
graph
Person 1 Person 2
Airplane 1 Airplane 2
Fuel 1 Fuel 2
Person 1 Person 2
Airplane 1Fuel1
Airplane 2Fuel2
Experimental setup
• Objective– Minimize number of actions
• Domains– Selected domains from the International Planning Competition
• Logistics
• Freecell
• Driverlog
• Zenotravel
• TPP
• Blocksworld
• Resources– 2.67Ghz Linux machine– 1GB memory– 15 minutes runtime– CPLEX 10.0
Experimental setup
• Distance estimates– LP
• Action selection formulation with strengthening
– LP–
• Action selection formulation without strengthening
– Lplan• Step based integer programming formulation by Lplan [Bylander, 1997]
– h+
• Optimal relaxed plan when the delete effects are ignored
– hFF
• Inadmissible but efficient relaxed plan heuristic by FF [Hoffmann, and Nebel, 2001]
– Optimal• Optimal distance estimate given by Satplanner using the –opt flag
[Rintanen, Heljanko, and Niemela, 2005]
Experimental resultsProblem LP LP- Lplan h+ hFF Optimallog4-0 20 16.0* 17 19 19 20log4-1 19 14.0* 15 17 17 19log4-2 15 10.0* 11 13 13 15log5-1 17 12.0* 13 15 15 17log5-2 8 6.0* 7 8 8 8log6-1 14 10.0* 11 13 13 14log6-9 24 18.0* 19 21 21 24log12-0 42 32.0* 33 39 39 -log15-1 67 54.0* - 63 66 -freecell2-1 9 9 9 9 9 9freecell2-2 8 8 8 8 8 8freecell2-3 8 8 8 8 9 8freecell2-4 8 8 8 8 9 8freecell2-5 9 9 9 9 9 9freecell3-5 12 12 13 13 14 -freecell13-3 55 55 - - 95 -freecell13-4 54 54 - - 94 -freecell13-5 52 52 - - 94 -driverlog1 7 3.0* 7 6 8 7driverlog2 19 12.0* 13 14 15 19driverlog3 11 8.0* 9 11 11 12driverlog4 15.5 11.0* 12 12 15 16driverlog6 11 8.0* 9 10 10 11driverlog7 13 11.0* 12 12 15 13driverlog13 24 15.0* 16 21 26 -driverlog19 96.6* 60.0* - 89 93 -driverlog20 89.5* 60.0* - 84 106 -
Experimental resultsProblem LP LP- Lplan h+ hFF Optimalzenotravel1 1 1 1 1 1 1zenotravel2 6 3.0* 5 4 4 6zenotravel3 6 4.0* 5 5 5 6zenotravel4 8 5.0* 6 6 6 8zenotravel5 11 8.0* 9 11 11 11zenotravel6 11 8.0* 9 11 13 11zenotravel13 24 18.0* 19 23 23 -zenotravel19 66.2* 46.0* - 62 63 -zenotravel20 68.3* 50.0* - - 69 -tpp1 5 3.0* 5 4 4 5tpp2 8 6.0* 7 7 7 8tpp3 11 9.0* 10 10 10 11tpp4 14 12.0* 13 13 13 14tpp5 19 15.0* 17 17 17 19tpp6 25 21.0* 23 21 21 -tpp28 - 150.0* - - 88 -tpp29 - - - - 104 -tpp30 - 174.0* - - 101 -bw-sussman 4 4 6 5 5 6bw-12step 4 4 8 4 7 12bw-large-a 12 12 12 12 12 12bw-large-b 16 16 18 16 16 18
Distance estimates from the initial state to the goal (highlighted values equal the optimal distance)
Experimental results
• Heuristic calculation time
0.01
0.1
1
10
100
1000lp
lp-
lplan
h+
Logistics Freecell Driverlog Zenotravel TPP Blocks
Conclusions and future work
• LP-based heuristic that respects delete effects, but ignores action ordering shows very promising results– Finds the optimal distance estimate in several problem instances– Can be used to calculate admissible distance estimates for
various optimization problems in planning– Ongoing work successfully incorporated our LP-based heuristic
in a search algorithm that solves oversubscription planning
• Interesting directions for future work– Apply fluent merging more aggressively– Extend the formulation into a complete planning system
LP-based heuristic
Relax the ordering of the actions
Setup an integer programming formulation
Solve the LP-relaxation and use the objective function value as an admissible distance estimate
Strengthen the formulation by adding valid inequalites