Upload
amory
View
36
Download
1
Tags:
Embed Size (px)
DESCRIPTION
Cross-Domain Action-Model Acquisition for Planning viaWeb Search. Hankz Hankui Zhuo a , Qiang Yang b , Rong Pan a and Lei Li a a Sun Yat-sen University, China b Hong Kong University of Science & Technology, Hong Kong. Motivation. - PowerPoint PPT Presentation
Citation preview
Cross-Domain Action-Model Acquisition for Planning viaWeb Search
Hankz Hankui Zhuoa, Qiang Yangb, Rong Pana and Lei Lia
aSun Yat-sen University, ChinabHong Kong University of Science & Technology, Hong Kong
Motivation
There are many domains that share knowledge with each other, e.g.,
Motivation
There are many domains that share knowledge with each other, e.g.,
“walking” in the driverlog domain
http://www.superstock.com/stock-photos-images/1778R-4701
Motivation
There are many domains that share knowledge with each other, e.g.,
“walking” in the driverlog domain “navigating” in the rovers domain
http://www.superstock.com/stock-photos-images/1778R-4701
http://www.pixelparadox.com/mars.htm
Motivation
There are many domains that share knowledge with each other, e.g.,
“walking” in the driverlog domain “navigating” in the rovers domain “moving” in the elevator domain etc…
http://www.superstock.com/stock-photos-images/1778R-4701
http://www.venusengineers.com/goods-lift.html
http://www.pixelparadox.com/mars.htm
Motivation
These actions in these domains all share the common knowledge about location change, thus,
it may be possible to “borrow” knowledge from each other. specifically, next slide …
http://www.superstock.com/stock-photos-images/1778R-4701
http://www.venusengineers.com/goods-lift.html
http://www.pixelparadox.com/mars.htm
Motivation
http://www.superstock.com/stock-photos-images/1778R-4701
http://www.pixelparadox.com/mars.htm
walk(?d-driver ?l1-loc ?l2-loc)
:precondition
(and (at ?d ?l1) (path ?l1 ?l2))
:effect
(and (not (at ?d ?l1)) (at ?d ?l2)))
Motivation
http://www.superstock.com/stock-photos-images/1778R-4701
http://www.pixelparadox.com/mars.htm
navigate(?d-rover ?x-waypoint ?y-waypoint)
:precondition ??
:effect ??
guess?walk(?d-driver ?l1-loc ?l2-loc)
:precondition
(and (at ?d ?l1) (path ?l1 ?l2))
:effect
(and (not (at ?d ?l1)) (at ?d ?l2)))
Motivation
http://www.superstock.com/stock-photos-images/1778R-4701
http://www.pixelparadox.com/mars.htm
walk(?d-driver ?l1-loc ?l2-loc)
:precondition
(and (at ?d ?l1) (path ?l1 ?l2))
:effect
(and (not (at ?d ?l1)) (at ?d ?l2)))
navigate(?d-rover ?x-waypoint ?y-waypoint)
:precondition
(at ?x ?y) (visible ?y ?z) …
:effect
(not (at ?x ?y)) (at ?x ?z)
guess?
Motivation
http://www.superstock.com/stock-photos-images/1778R-4701
http://www.pixelparadox.com/mars.htm
walk(?d-driver ?l1-loc ?l2-loc)
:precondition
(and (at ?d ?l1) (path ?l1 ?l2))
:effect
(and (not (at ?d ?l1)) (at ?d ?l2)))
navigate(?d-rover ?x-waypoint ?y-waypoint)
:precondition
(at ?x ?y) (visible ?y ?z) …
:effect
(not (at ?x ?y)) (at ?x ?z)
guess?
Motivation
http://www.superstock.com/stock-photos-images/1778R-4701
http://www.pixelparadox.com/mars.htm
walk(?d-driver ?l1-loc ?l2-loc)
:precondition
(and (at ?d ?l1) (path ?l1 ?l2))
:effect
(and (not (at ?d ?l1)) (at ?d ?l2)))
navigate(?d-rover ?x-waypoint ?y-waypoint)
:precondition
(at ?d ?x) (visible ?x ?y) …
:effect
(not (at ?d ?x)) (at ?d ?y)
guess?
Motivation
In this work, we aim at learning action models from a target domain, e.g., learning the model of “navigate” in rovers,
by transferring knowledge from another domain, called a source domain, e.g., the knowledge of the model “walk” in driverlog.
Problem Formulation
Formally, our learning problem can be addressed: Given as inputs:
Action models from a source domain: As
A few plan traces from the target domain: {<s0,a1,s1,…,an,sn>},
where si is a partial state, and ai is an action.
Action schemas from the target domain: A’ Predicates from the target domain: P
Problem Formulation
Formally, our learning problem can be addressed: Given as inputs:
Action models from a source domain: As
A few plan traces from the target domain: {<s0,a1,s1,…,an,sn>},
where si is a partial state, and ai is an action.
Action schemas from the target domain: A’ Predicates from the target domain: P
Output: Action models in the target domain: At
Problem Formulation
Our assumptions are: based on STRIPS domain people do not write action names randomly:
E.g., not using “eat” to express “move”!
no need to observe full intermediate states in plan traces, i.e., intermediate state can be partial or empty.
action sequences in plan traces are correct. actions in plan traces are all ordered, i.e., there are no concurrent
actions. there is information available in the Web related to “actions”.
Our Algorithm: LAWS
Constraints from
web searching
plan
traces
Web constraints
Action constraints
Plan constraints
Solve constraints
Action models At
Build constraints
predicatesaction
schemas
source
As
State constraints
Our Algorithm: LAMMAS
Constraints from states
between actions
plan
traces
Web constraints
Action constraints
Plan constraints
Solve constraints
Action models At
Build constraints
predicatesaction
schemas
source
As
State constraints
Our Algorithm: LAMMAS
Constraints imposed
on action models
plan
traces
Web constraints
Action constraints
Plan constraints
Solve constraints
Action models At
Build constraints
predicatesaction
schemas
source
As
State constraints
Our Algorithm: LAMMAS
Constraints to ensure
causal links in traces.
plan
traces
Web constraints
Action constraints
Plan constraints
Solve constraints
Action models At
Build constraints
predicatesaction
schemas
source
As
State constraints
Our Algorithm: LAMMAS
Solving constraints
Using a weighted
MAXSAT solver.
plan
traces
Web constraints
Action constraints
Plan constraints
Solve constraints
Action models At
Build constraints
predicatesaction
schemas
source
As
State constraints
Web constraints
Used to measure the similarity between two actions.
To do this, we search two actions in the Web.
Specifically, we build predicate-action pairs from the target domain as follows:
Where, p is a predicate a is an action schema p’s parameters are included by a’s
plan
traces
Web constraints
Action constraints
Plan constraints
Solve constraints
Action models At
Build constraints
predicatesaction
schemas
source
As
State constraints
))}()((
'|,{
aPARApPARA
AaPpapPAt
Web constraints
Similarly, we build predicate-action pairs from the source:
where, PAs
pre, PAsadd, PAs
del, denote sets of precondition-action pairs, add-action pairs and del-action pairs.
Note that we require p PRE(a), ∈which is different from PAt
)}(|,{
)},(|,{
)},(|,{
aDELpAaapPA
aADDpAaapPA
aPREpAaapPA
sdel
sadd
spre
s
s
s
plan
traces
Web constraints
Action constraints
Plan constraints
Solve constraints
Action models At
Build constraints
predicatesaction
schemas
source
As
State constraints
Web constraints
Next, we collect a set of web documents D={di} by searching keyword
w=<p,a> ∈PAt.
We process each page di as a vector yi by calculating the tf-idf (Jones 1972).
As a result, we have a set of real-number vectors Y={yi}.
Likewise, we can easily get a set of vectors X={xi} by searching keyword w’=<p’,a’>∈PAs
pre.
plan
traces
Web constraints
Action constraints
Plan constraints
Solve constraints
Action models At
Build constraints
predicatesaction
schemas
source
As
State constraints
Web constraints
We define the similarity function between two keywords w and w’ as follows:
similarity(w,w’)=MMD2(F, Y, X),
plan
traces
Web constraints
Action constraints
Plan constraints
Solve constraints
Action models At
Build constraints
predicatesaction
schemas
source
As
State constraints MMD is the Maximum Mean Discrepancy,
which is given by (Borgwardt et al. 2006).
The mathematics is like:
Web constraints
We define the similarity function between two keywords w and w’ as follows:
similarity(w,w’)=MMD2(F, Y, X),
plan
traces
Web constraints
Action constraints
Plan constraints
Solve constraints
Action models At
Build constraints
predicatesaction
schemas
source
As
State constraints
)exp(),( 2
2
2ji yx
ji yxk
nm
jijimn
n
jijinn
m
jijimm
yxkyyk
xxkYXFMMD
,
1,
2)1(
1
)1(12
),(),(
),(),,(
where
MMD is the Maximum Mean Discrepancy,
which is given by (Borgwardt et al. 2006).
The mathematics is like:
Web constraints
We define the similarity function between two keywords w and w’ as follows:
similarity(w,w’)=MMD2(F, Y, X),
plan
traces
Web constraints
Action constraints
Plan constraints
Solve constraints
Action models At
Build constraints
predicatesaction
schemas
source
As
State constraints
)exp(),( 2
2
2ji yx
ji yxk
nm
jijimn
n
jijinn
m
jijimm
yxkyyk
xxkYXFMMD
,
1,
2)1(
1
)1(12
),(),(
),(),,(
where
MMD is Maximum Mean Discrepancy,
which is given by (Borgwardt et al. 2006).
The mathematics is like:
a set of feature mapping function of a
Gaussian kernel.
Web constraints
Finally, we generate weighted web constraints by the following steps: For each w=<p,a>∈PAt, and
w’=<p’,a’>∈PAspre , we
calculate similarity(w,w’), Generate a constraint
p ∈PRE(a),
and associate it with similarity(w, w’) as its weight.
likewise for ADD(a) and DEL(a)
plan
traces
Web constraints
Action constraints
Plan constraints
Solve constraints
Action models At
Build constraints
predicatesaction
schemas
source
As
State constraints
State constraints (given by Yang et.al 2007)
plan
traces
Web constraints
Action constraints
Plan constraints
Solve constraints
Action models At
Build constraints
predicatesaction
schemas
source
As
State constraints
)PRE(then
,)()( and , before appears if
ap
aparapparaap
)ADD(then
,)()( and ,after appears if
ap
aparapparaap
The weights of all the constraints are calculated by counting their occurrences in all the plan traces.
Generally, if p frequently appears before a, it is probably a precondition of a. Specifically,
Action constraints (given by Yang et.al 2007)
Action constraints are imposed to ensure the learned action models are succinct, which is
)PRE()ADD( apap
These constraints are associated with the maximal weight of all the state constraints to ensure these constraints are maximally satisfied.
plan
traces
Web constraints
Action constraints
Plan constraints
Solve constraints
Action models At
Build constraints
predicatesaction
schemas
source
As
State constraints
Plan constraints (given by Yang et.al 2007)
plan
traces
Web constraints
Action constraints
Plan constraints
Solve constraints
Action models At
Build constraints
predicatesaction
schemas
source
As
State constraints
We require that causal links in plan traces are not broken. Thus, we build constraints as follows. For each precondition p of an action aj
in a plan trace, either p is in the initial state, or there is ai prior to aj that adds p, and no ak between ai and aj that deletes p:
where i < k < j. For each literal q in goal, either q is in
the initial state s0, or there is ai that adds q and no ak that deletes q:
)()()( kij aDelpaAddpaPrep
))()((0 ki aDelqaAddqsq
Plan constraints (given by Yang et.al 2007)
plan
traces
Web constraints
Action constraints
Plan constraints
Solve constraints
Action models At
Build constraints
predicatesaction
schemas
source
As
State constraints
To ensure these constraints are maximally satisfied, we assign these constraints with the maximal weight of state constraints.
Solve constraints
Before solving all these constraints, we adjust the weights of web constraints by replacing the original weights wo with wo’:
where wm is the maximal value of weights of state constraints, and γ belongs to [0,1).
We can easily adjust wo’ from 0 to +∞ by varying γ from 0 to 1.
plan
traces
Web constraints
Action constraints
Plan constraints
Solve constraints
Action models At
Build constraints
predicatesaction
schemas
source
As
State constraintsomo www
1
'
Solve constraints
. ofeffect an into converted be will
, as assigned is " )ADD(" if
ap
trueap
Solve these weighted constraints by running a weighted MAXSAT solver.
The attained result is converted to action models, e.g.
plan
traces
Web constraints
Action constraints
Plan constraints
Solve constraints
Action models At
Build constraints
predicatesaction
schemas
source
As
State constraints
Experimental Result
Example result: (:action walk(?d - rover ?x - waypoint ?y - waypoint)
:precondition (and (at ?d ?x) (visible ?x ?y))
:effect (and (not (at ?d ?x))
(at ?d ?y) (not (visible ?x ?y))))
Missing
condition
Extra
condition
We calculate the error rate by counting all the missing and extra conditions, and finally get the accuracy.
By comparing to hand-written action models, we know that there is a missing/extra condition.
Experimental Result
We compared LAWS to t-LAMP (by Zhuo et. al. 2009) and ARMS (Yang et. al. 2007), where t-LAMP “borrows” knowledge by building syntax mappings; ARMS learns without “borrowing” knowledge.
The results are shown below:
Experimental Result
We can see that LAWS > t-LAMP > ARMS: accuracies of LAWS are higher than t-
LAMP and ARMS, which empirically shows the advantage of LAWS. accuracies decrease when plan traces increase, which is consistent with
our intuition, since more information will help learning.
Experimental Result
We also test the following three cases: Case I(γ = 0): not borrowing knowledge; Case II(γ = 0.5 and wo = 1): weights of web constraints are the same,
i.e., not using similarity function; Case III(γ = 0.5): using the similarity function.
The results are shown bellow:
Experimental Result
We can see that: Case III > the other two: suggests the similarity function could
really help improve the learning result; Case II > Case I: suggests that web constraints is helpful;
Experimental Result
Next, we test different ratios of states: Accuracy generally
increases when the ratio increases;
This is consistent with our intuition, since the increasing information could help improve the learning result.
Experimental Result
We also test different values of γ: When γ increases from 0 to 0.5, the accuracy increases, which exhibits
when the effect of web knowledge enlarges, the accuracy gets higher; However, when γ is larger than 0.5, the accuracy decreases when γ
increases. This is because the impact of plan traces is relatively reduced. This suggests knowledge from plan traces is also important in learning high-quality action models.
Cpu Time
The Cpu time is smaller than 1,000 seconds on a typical 2 GHZ PC with 1GB memory.
It is quite reasonable in learning. However, it did not include web searching time, since it mainly depends on specific network quality.
Conclusion
In this paper, we propose an algorithm framework to “borrow” knowledge from another domain with web search, and empirically show the improvement of the learning quality.
Our work can be extended to more complex action models, e.g., PDDL models.
Can also be extended to multi-task action-model acquisition.
Thank You