24
This article reports on an extensive survey and analysis of research work related to machine learn- ing as it applies to automated planning over the past 30 years. Major research contributions are broadly characterized by learning method and then descriptive subcategories. Survey results re- veal learning techniques that have extensively been applied and a number that have received scant attention. We extend the survey analysis to suggest promising avenues for future research in learning based on both previous experience and current needs in the planning community. I n this article, we consider the symbiosis of two of the most broadly recognized hall- marks of intelligence: (1) planning—solving problems in which one uses beliefs about ac- tions and their consequences to construct a se- quence of actions that achieve one’s goals— and (2) learning—using past experience and pre- cepts to improve one’s ability to act in the fu- ture. Within the AI research community, ma- chine learning is viewed as a potentially powerful means of endowing an agent with greater autonomy and flexibility, often com- pensating for the designer’s incomplete knowl- edge of the world that the agent will face and incurring low overhead in terms of human oversight and control. If we view a computer program with learning capabilities as an agent, then we can say that learning takes place as a result of the interaction of the agent and the world and observation by the agent of its own decision-making processes. Planning is one such decision-making process that such an agent might undertake, and a corpus of work spanning some 30 years attests that it is an in- teresting, broad, and fertile field in which learning techniques can be applied to advan- tage. We focus here on this learning-in-plan- ning research and utilize both tables and graph- ic maps of existing studies to spotlight the combinations of planning-learning methods that have received the most attention as well as those that have scarcely been explored. We do not attempt to provide, in this limited space, a tutorial of the broad range of planning and learning methodologies, assuming instead that the interested reader has at least passing famil- iarity with these fields. A cursory review of the state of the art in learning in planning during the early to mid- 1990s reveals that the primary impetus for learning was to make up for often debilitating weaknesses in the planners themselves. The general-purpose planning systems of even a decade ago struggled to solve simple problems in the classical benchmark domains; blocks world problems of 10 blocks lay beyond their capabilities as did most logistics problems (Kodtratoff and Michalski 1990; Minton 1993). The planners of the period used only weak guidance in traversing their search spaces, so it is not surprising that augmenting the systems to learn some such guidance was often a win- ning strategy. Relative to the largely naïve base planner, the learning-enhanced systems dem- onstrated improvements in both the size of problems that could be addressed and the speed with which they could be solved (Kamb- hampati, Katukam, and Qu 1996; Leckie and Zukerman 1998; Minton et. al. 1989; Veloso and Carbonell 1993). Articles SUMMER 2003 73 Learning-Assisted Automated Planning Looking Back, Taking Stock, Going Forward Terry Zimmerman and Subbarao Kambhampati Copyright © 2003, American Association for Artificial Intelligence. All rights reserved. 0738-4602-2003 / $2.00

Learning-Assisted Automated Planning - Subbarao Kambhampati

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Learning-Assisted Automated Planning - Subbarao Kambhampati

■ This article reports on an extensive survey andanalysis of research work related to machine learn-ing as it applies to automated planning over thepast 30 years. Major research contributions arebroadly characterized by learning method andthen descriptive subcategories. Survey results re-veal learning techniques that have extensivelybeen applied and a number that have receivedscant attention. We extend the survey analysis tosuggest promising avenues for future research inlearning based on both previous experience andcurrent needs in the planning community.

In this article, we consider the symbiosis oftwo of the most broadly recognized hall-marks of intelligence: (1) planning—solving

problems in which one uses beliefs about ac-tions and their consequences to construct a se-quence of actions that achieve one’s goals—and (2) learning—using past experience and pre-cepts to improve one’s ability to act in the fu-ture. Within the AI research community, ma-chine learning is viewed as a potentiallypowerful means of endowing an agent withgreater autonomy and flexibility, often com-pensating for the designer’s incomplete knowl-edge of the world that the agent will face andincurring low overhead in terms of humanoversight and control. If we view a computerprogram with learning capabilities as an agent,then we can say that learning takes place as aresult of the interaction of the agent and theworld and observation by the agent of its owndecision-making processes. Planning is onesuch decision-making process that such anagent might undertake, and a corpus of work

spanning some 30 years attests that it is an in-teresting, broad, and fertile field in whichlearning techniques can be applied to advan-tage. We focus here on this learning-in-plan-ning research and utilize both tables and graph-ic maps of existing studies to spotlight thecombinations of planning-learning methodsthat have received the most attention as well asthose that have scarcely been explored. We donot attempt to provide, in this limited space, atutorial of the broad range of planning andlearning methodologies, assuming instead thatthe interested reader has at least passing famil-iarity with these fields.

A cursory review of the state of the art inlearning in planning during the early to mid-1990s reveals that the primary impetus forlearning was to make up for often debilitatingweaknesses in the planners themselves. Thegeneral-purpose planning systems of even adecade ago struggled to solve simple problemsin the classical benchmark domains; blocksworld problems of 10 blocks lay beyond theircapabilities as did most logistics problems(Kodtratoff and Michalski 1990; Minton 1993).The planners of the period used only weakguidance in traversing their search spaces, so itis not surprising that augmenting the systemsto learn some such guidance was often a win-ning strategy. Relative to the largely naïve baseplanner, the learning-enhanced systems dem-onstrated improvements in both the size ofproblems that could be addressed and thespeed with which they could be solved (Kamb-hampati, Katukam, and Qu 1996; Leckie andZukerman 1998; Minton et. al. 1989; Velosoand Carbonell 1993).

Articles

SUMMER 2003 73

Learning-Assisted Automated Planning

Looking Back, Taking Stock, Going Forward

Terry Zimmerman and Subbarao Kambhampati

Copyright © 2003, American Association for Artificial Intelligence. All rights reserved. 0738-4602-2003 / $2.00

Page 2: Learning-Assisted Automated Planning - Subbarao Kambhampati

Where Learning Might Assist Planning

In a number of ways, automated planning pre-sents a fertile field for the application of ma-chine learning. The simple (STRIPS) planningproblem itself has been shown to be PSPACEcomplete (Bylander 1992); thus, for planningsystems to handle problems large enough to beof interest, they must greatly reduce the size ofthe search space they traverse. Indeed, thegreat preponderance of planning research,from alternate formulations of the planningproblem to the design of effective searchheuristics, can be seen as addressing this prob-lem of pruning the search space. It is thereforenot surprising that the earliest and most wide-spread application of learning to automatedplanning has focused on the aspect of expedit-ing solution search.

As automated planning advanced beyondsolving trivial problems, the issue of plan qual-ity received increased attention. Althoughthere are often many valid plans for a givenproblem, generating one judged acceptable bythe user or optimizing over several quality met-rics can increase the complexity of the plan-ning task immensely. A learning-augmentedplanning system that can perceive a user’s pref-erences and bias its subsequent search accord-ingly offers a means of reducing this complex-ity. Learning seems to have an obvious role inmixed-initiative planning, where it might beimperative to perceive and accommodate theexpertise, preferences, and idiosyncrasies ofhumans. Finally, expanding our view to a real-world situation in which a planning systemmight operate, we are likely to confront uncer-tainty as a fact of life, and complete and robustdomain theories are rare. As we show, the studyof machine learning methods in planning ap-proaches that address uncertainty is in its in-fancy.

Machine learning offers the promise of ad-dressing such issues by endowing the planningsystem with the ability to profit from observa-tion of its problem space and its decision-mak-ing experience, whether or not its currentlypreferred decision leads to success. However, toactually realize this promise within a given ap-plication challenges the planning system de-signer on many fronts. Success is generallyheavily dependent on complex relationshipsand interconnections between planning andlearning. In figure 1, we suggest five dimen-sions that capture perhaps the most importantof these system design issues: (1) type of plan-ning problem, (2) approach to planning, (3)goal for the learning component, (4) planning-

With the advent of several new genres ofplanning systems in the past five to six years,the entire base-performance level againstwhich any learning-augmented system mustcompare has shifted dramatically. It is argu-ably a more difficult proposition to acceleratea planner in this generation by outfitting itwith some form of online learning becausethe overhead cost incurred by the learningsystem can overwhelm the gains in search ef-ficiency. This, in part, might explain why theplanning community appears to have paidless attention to learning in recent years.From the machine learning–community per-spective, Langley (1997, p. 18) remarked onthe swell of research in learning for problemsolving and planning that took place in the1980s as well as to note the subsequent tail-off: “One source is the absence of robust algo-rithms for learning in natural language, plan-ning, scheduling, and configuration, butthese will come only if basic researchers re-gain their interest in these problems.”

Of course, interest in learning within theplanning community should not be limited toanticipated speedup benefits. As automatedplanning has advanced its reach to the pointwhere it can cross the threshold from toy prob-lems to some interesting real-world applica-tions, a variety of issues come into focus. Theyrange from dealing with incomplete and un-certain environments to developing an effec-tive interface with human users.

Our purpose in this article is to develop, us-ing an extensive survey of published work, abroad perspective of the diverse research thathas been conducted to date in learning inplanning and to conjecture about profitabledirections for future work in this area. The re-mainder of the article is organized into threeparts: (1) what learning is likely to be of assis-tance in automated planning, (2) what roleshas learning actually played in the relevantplanning research conducted to date, and (3)where might the research community gainful-ly direct its attentions in the near future. Inthe section entitled Where Learning Might As-sist Planning, we describe a set of five dimen-sions for classifying learning-in-planning sys-tems with respect to properties of both theunderlying planning engine and the learningcomponent. By mapping the breadth of thesurveyed work along these dimensions, we re-veal some underlying research trends, pat-terns, and possible oversights. This mappingmotivates our speculation in the final sectionon some promising directions for such re-search in the near future, given our currentgeneration of planning systems.

Articles

74 AI MAGAZINE

Page 3: Learning-Assisted Automated Planning - Subbarao Kambhampati

execution phase in which learning is conduct-ed, and (5) type of learning method.

We hope to show that this set of dimensionsis useful in both gaining useful perspective onthe work that has been done in learning-aug-mented planning and speculating about prof-itable directions for future research. Admitted-ly, these are not independent or orthogonaldimensions; they also do not make up an ex-haustive list of relevant factors in the design ofan effective learning component for a givenplanner. Among other candidate dimensionsthat could have been included are type of plan(for example, conditional, conformant, serial,or parallel actions), type of knowledge learned(domain or search control), learning impetus(data driven or knowledge driven), and type oforganization (hierarchical or flat). Given the

corpus of work to date and the difficulty of vi-sualizing and presenting patterns and relation-ships in high-dimensional data, we settled onthe five dimensions of figure 1 as the most re-vealing. Before reporting on the literature sur-vey, we briefly discuss each of these dimen-sions.

Planning Problem TypeThe nature of the environment in which theplanner must conduct its reasoning defineswhere a given problem lies in the continuumof classes from classical to full-scope planning.Here, classical planning refers to a world modelin which fluents are propositional, and theydon’t change unless the planning agent acts tochange them, all relevant attributes can be ob-served at any time, the impact of executing an

Articles

SUMMER 2003 75

State Space Search

Plan Space Search[SNLP, TWEAK,

UCPOP…]

SpeedUp

Planning

ImprovePlan

Quality

Learnor Improve

DomainTheory

Before Planning Starts

During Planning Process

During Plan Execution

Static Analysisand Abstractions

Derivational Analogy / Case Based

Decision Tree

Inductive LogicProgramming

Neural Network

Bayesian Learning

ReinforcementLearning

Analytic and Induction

EBL and InductiveLogic Programming

EBL and Reinforce-ment Learning

Multistrategy

Inductive

Analytic

Explanation-BasedLearning

Problem Type Planning Approach

Planning-Learning

Goal

Learning Phase

Type of Learning

Classical Planning � Static world � Deterministic � Fully observable � Instantaneous Actions� Propositional

Full-Scope Planning � Dynamic world � Stochastic � Partially observable � Asynchronous Goals � Metric/Continuous

Disjunctive SS[GRAPHPLAN, STAN, IPP…]

Conjunctive SS[PRODIGY, HSPR,

FF…1]

CompilationApproaches

CSP[CP-CSP …]

SAT[SATPLAN,

BLACKBOX ...]

IntegerProgramming

Learning AspectsPlanning Aspects

Beyond ClassicalModes

Analogic

Figure 1. Five Dimensions Characterizing Automated Planning Systems Augmented with a Learning Component.CSP = constraint-satisfaction programming. EBL = explanation-based learning. SAT = satisfiability.

Page 4: Learning-Assisted Automated Planning - Subbarao Kambhampati

state-space search (Kambhampati 2000). BLACK-BOX (Kautz and Selman 1999) uses GRAPHPLAN’sdisjunctive representation of states and itera-tively converts the search into a SAT problem.

Goal of Planner’s Learning ComponentThere is a wide variety of targets that the learn-ing component of a planning system mightaim toward, such as learning search controlrules, learning to avoid dead-end or unpromis-ing states, or improving an incomplete domaintheory. As indicated in figure 1, they can becategorized broadly into one of three groups:(1) learning to speed up planning, (2) learningto elicit or improve the planning domain the-ory, or (3) learning to improve the quality ofthe plans produced (where quality can have awide range of definitions).

Learning and Improving Domain Theo-ry Automated planning implies the presenceof a domain theory—the descriptions of the ac-tions available to the planner. When an exactmodel of how an agent’s actions affect itsworld is unavailable (a nonclassical planningproblem), there are obvious advantages to aplanner that can evolve its domain theory bylearning. Few interesting environments aresimple and certain enough to admit a completemodel of their physics, so it’s likely that even“the best laid plans” based on a static domaintheory will occasionally (that is, too often) goastray. Each such instance, appropriately fedback to the planner, provides a learning oppor-tunity for evolving the domain theory towarda version more consistent with the actual envi-ronment in which its plans must succeed.

Even in classical planning, the designer of aproblem domain generally has many valid al-ternative ways of specifying the actions, and itis well known that the exact form of the actiondescriptions can have a large impact on the ef-ficiency of a given planner on a given problem.Even if the human designer can identify someof the complex manner in which the actions ina domain description will interact, he/she willlikely be faced with trade-offs between efficien-cy and factors such as compactness, compre-hensibility, and expressiveness.

Planning Speedup In all but the mosttrivial of problems, a planner will have to con-duct considerable search to construct a solu-tion, in the course of which it will be forced tobacktrack numerous times. The primary goalsof speedup learning are to avoid unpromisingportions of the search space and bias thesearch in directions most likely to lead tohigh-quality plans.

action on the environment is known and de-terministic, and the effects of taking an actionoccur instantly. If we relax all these constraintssuch that fluents can take on a continuousrange of values (for example, metric), a fluentmight change its value spontaneously or forreasons other than agent actions—for example,the world has hidden variables, the exact im-pact of acting cannot be predicted, and actionshave durations—then we are in the class offull-scope planning problems. In between theseextremes lies a wide variety of interesting andpractical planning problem types, such as clas-sical planning with a partially observableworld (for example, playing poker) and classi-cal planning where actions realistically requiresignificant periods of time to execute (for ex-ample, logistics domains). The difficulty witheven the classical planning problem is that itlargely occupied the full attention of the re-search community until the past few years. Thecurrent extension into various neoclassical,temporal, and metric planning modes has beenspurred in part by impressive advances in auto-mated planning technology over the past sixyears or so.

Planning ApproachPlanning as a subfield of AI has roots in Newelland Simon’s 1960-era problem-solving system,GPS, and theorem proving. At a high level,planning can be viewed as either a problemsolver or theorem prover. Planning methodscan further be seen as either search processes ormodel checking. Among planners most com-monly characterized by search mode, there aretwo broad categories: (1) search in state spaceand (2) search in a space of plans. It is possibleto further partition current state-space plan-ners into those that maintain a conjunctivestate representation and those that search in adisjunctive representation of possible states.

Planners most generally characterized asmodel checkers (although they also conductsearch) involve recompiling the planning prob-lem into a representation that can be tackled bya particular problem solution engine. These sys-tems can be partitioned into three categories:(1) satisfiability (SAT), constraint-satisfactionproblems (CSPs), and integer linear program-ming (IP). Figure 1 lists these three differentmethods along with representative planningsystems for each. These categories are not en-tirely disjoint for purposes of classifying plan-ners because some systems use a hybrid ap-proach or can be viewed as examples of morethan one method. GRAPHPLAN (Blum and Furst1997), for example, can be seen as either a dy-namic CSP or as a conductor for disjunctive

Articles

76 AI MAGAZINE

Page 5: Learning-Assisted Automated Planning - Subbarao Kambhampati

Improving Plan Quality This categoryranges from learning to bias the planner to-ward plans with a specified attribute or metricvalue to learning a user’s preferences in plansand variations of mixed-initiative planning.

Planning Phase in Which Learning Is ConductedAt least three opportunities for learning pre-sent themselves over the course of a planningand execution cycle: (1) before planning starts,(2) during the process of finding a valid plan,and (3) during the execution of a plan.

Learning before Planning Starts Beforethe solution search even begins, the specifica-tion of the planning problem itself presentslearning opportunities. This phase is closelyconnected to the aspect of learning and im-proving the domain theory but encompassesonly preprocessing of a given domain theory. Itis done offline and produces a modified do-main that is useful for all future domain prob-lems.

Learning during the Process of Findinga Valid Plan Planners capable of learningin this mode have been augmented with somemeans of observing their own decision-makingprocess. They then take advantage of their ex-perience during planning to expedite the fur-ther planning or improve the quality of plansgenerated. The learning process itself can ei-ther be online or offline.

Learning during the Execution of a PlanA planner has yet another opportunity to im-prove its performance when it is an embeddedcomponent of a system that can execute a planand provide sensory feedback. A system thatseeks to improve an incomplete domain theorywould conduct learning in this phase, as mighta planner seeking to improve plan qualitybased on actual execution experience. Thelearning process itself can either be online oroffline.

Type of LearningThe machine learning techniques themselvescan be classified in a variety of ways, irrespec-tive of the learning goal or the planning phasethey might be used in. Two of the broadest tra-ditional class distinctions that can be drawnare between so-called inductive (or empirical)methods and deductive (or analytic) methods.In figure 1, we broadly partition the machinelearning–techniques dimension into these twocategories along with a multistrategy ap-proach. We then consider additional propertiesthat can be used to characterize a given meth-od. The inductive-deductive classification is

drawn based on the following formulations ofthe learning problem:

Inductive learning: The learner is confront-ed with a hypothesis space H and a set of train-ing examples D. The desired output is a hy-pothesis h from H that is consistent with thesetraining examples.

Analytic learning: The learner is confront-ed with the same hypothesis space and train-ing examples as for inductive learning. Howev-er, the learner has an additional input: adomain theory B composed of backgroundknowledge that can be used to help explain ob-served training examples. The desired output isa hypothesis h from H that is consistent withboth the training examples D and the domaintheory B.

Understanding the advantages and disad-vantages of applying a given machine learningtechnique to a given planning system can helpto make sense of any research bias that be-comes apparent in the survey tables. The pri-mary types of analytic learning systems devel-oped to date, along with their relativestrengths and weaknesses and an indication oftheir inductive biases, are listed in table 1. Themajor types of pure inductive learning systemsare similarly described in table 2. Admittedly,the various subcategories within these tablesare not disjoint, and they don’t nicely partitionthe entire class (inductive or analytic).

The research literature itself conflicts attimes about what constitutes learning in a giv-en implementation, so tables 1 and 2 reflectthe decisions made in this regard for this study.

The classification scheme we propose forlearning-augmented planning systems is per-haps most inadequate when it comes to rein-forcement learning. We discuss this special case,in which planning and learning are inextricablyintertwined, in the sidebar “ReinforcementLearning: The Special Case.”

Analogical learning is only represented intable 1 by a specialized and constrained formknown as derivational analogy and the closelyrelated case-based reasoning formulism. Moreflexible and powerful forms of analogy can beenvisioned (compare Hofstadter and Marshall[1996, 1993]), but the lack of active research inthis area within the machine learning commu-nity effectively eliminates more general analo-gy as a useful category in our learning-in-plan-ning survey.

The three columns for each technique givenin tables 1 and 2 give a sense of the degree towhich the method can be effective when ap-plied to a given learning problem, in our case,automated planning. Two columns summarizethe relative strengths and weaknesses of each

Articles

SUMMER 2003 77

Page 6: Learning-Assisted Automated Planning - Subbarao Kambhampati

ly justified hypotheses. The logical justifica-tions fall short when the prior knowledge isflawed, and the statistical justifications are sus-pect when data are scarce, or assumptionsabout distributions are questionable.

We next consider the learning-in-planningwork that has been done in light of the charac-terization structure given in figure 1.

What Role Has Learning Played in Planning?

We report here the results of an extensive sur-vey of AI research literature focused on applica-tions of machine learning techniques to plan-ning. Research in the area of machine learninggoes back at least as far back as 1959, withArthur Samuel’s (1959) checkers-playing pro-gram that improved its performance throughlearning. It is noteworthy that perhaps the firstwork in what was to become the AI field ofplanning (STRIPS [Fikes and Nilsson 1971]) wasquickly followed by a learning-augmented ver-sion that could improve its performance by an-alyzing its search experience (Fikes, Hart, andNilsson 1972). Space considerations precludean all-inclusive survey for this 30-year span,

technique. The column headed Models refersto the type of function or structure that themethod was designed to represent or process. Amethod chosen to learn a particular function isnot well suited if it is either incapable of ex-pressing the function or is inherently muchmore expressive than required. This choice ofrepresentation involves a crucial trade-off. Avery expressive representation that allows thetarget function to be represented as close aspossible will also require more training data tochoose among the alternative hypotheses itcan represent.

The heart of the learning problem is how tosuccessfully generalize from examples. Analyti-c learning leans on the learner’s backgroundknowledge to analyze a given training instanceto discern the relevant features. In many do-mains, such as the stock market, complete andcorrect background knowledge is not available.In these cases, inductive techniques that candiscern regularities over many examples in theabsence of a domain model can prove useful.One possible motivation for adopting a multi-strategy approach is that analytic learningmethods generate logically justified hypothe-ses, but inductive methods generate statistical-

Articles

78 AI MAGAZINE

Analytic Technique Models Strengths Weaknesses

Nogood Learning(Memoization,Caching)

Inconsistent statesand sets of fluents

Simple, fast learning

Generally low computationaloverhead

Practical, widely used

Low strength learning—eachnogood typically prunes smallsections of search space

Difficult to generalize acrossproblems

Memory requirements can behigh

Explanation-BasedLearning (EBL)

Search control rules

Domain refinement

Uses a domain theory—the availablebackground knowledge

Can learn from a single trainingexample

If-then rules are generally intuitive(readable)

Widely used

Requires a domain theory—incorrect domain theory canlead to incorrect deductions

Rule utility problem

Static Analysis andAbstractions Learning

Existing problem /domain invariants orstructure

Performed “offline”, benefitsgenerally available for all subsequentproblems in domain.

Benefits vary greatlydepending on domainand problem

Derivational Analogy /Case-Based Reasoning(CBR)

Similarity betweencurrent state andpreviously catalogedstates

Holds potential for shortcuttingmuch planning effort where similarproblem states arise frequently.

Extendable to full analogy?

Large space required as caselibrary builds

Case-matching overhead

Revising old plan can becostly

Table 1. Characterization of the Most Common Analytic Learning Techniques.

Page 7: Learning-Assisted Automated Planning - Subbarao Kambhampati

but we wanted to list either seminal studies ineach category or a typical representative studyif the category has many.

It is difficult to present the survey results in2-dimensional (2D) format such that the fivedimensions represented in figure 1 are usefullyreflected. We used three different formats, em-phasizing different combinations and order-ings of the figure 1 dimensions:

First is a set of three tables organized aroundjust two dimensions: (1) type of learning and(2) type of planning.

Second is a set of tables reflecting all five di-mensions for each relevant study in the survey.

Third is a graphic representation providing a

visual mapping of the studies’ demographicsalong the five dimensions.

We discuss each of these representations inthe following subsections.

Survey Tables according to LearningType and Planning TypeTable 3A deals with studies focused primarilyon analytic (deductive) learning in its variousforms, and table 3B is concerned with induc-tive learning. Table 3C addresses studies andmultistrategy systems that aim at some combi-nation of analytic and inductive techniques.All studies and publications appearing in thesetables are listed in full in the reference section.

Articles

SUMMER 2003 79

Inductive Technique Models Strengths Weaknesses

Decision Tree Learning

Discrete-valuedfunctions, classificationproblems

Robust to noisy data,missing values

Learns disjunctive clauses

If-then rules are easilyunderstandable

Practical, widely used

Approximating real-valuedor vector-valued, functions(essentially propositional)

Incapable of learningrelational predicates

ArtificialNeural Networks

Discrete-, real-, andvector-valued functions

Robust to noisy and complexdata, errors in data

Long training times arecommon; learned targetfunction is largelyinscrutable

Inductive LogicProgramming

First-order logic, theoriesas logic programs

Robust to noisy data,missing values.

More expressive thanpropositional-based learners

Able to generate newpredicates.

If-then rules (Horn clauses)are easily understandable

Large training sample sizemight be needed to acquireeffective set of predicates

Rule utility problem

Bayesian Learning Probabilistic inferenceHypotheses that makeprobabilistic predictions

Readily combine priorknowledge with observeddata

Modifies hypothesisprobability incrementallybased on each trainingexample.

Require large initialprobability sets

High computational cost toobtain Bayes's optimalhypothesis

Reinforcement Learning Control policy tomaximize rewards.

Fits the MDP setting

Domain theory not required

Handling actions with non-deterministic outcomes

Optimal policy fromnonoptimal training sets,facilitates life-long learning

Depends on a real-valuedreward signal for eachtransition

Difficulty handling largestate spaces. Convergencecan be slow, spacerequirements can be huge

Table 2. Characterization of the Most Common Inductive Learning Techniques.

Page 8: Learning-Assisted Automated Planning - Subbarao Kambhampati

Articles

80 AI MAGAZINE

Planning ApplicationsAnalyticLearning

GeneralApplications

State Space(Conjunctive / Disjunctive)

Plan Space Compilation(CSP / SAT / IP)

Static / DomainAnalysis andAbstractions

Learning AbstractionsSacerdoti (1974) ABSTRIPS

Knoblock (1990) ALPINE

Static Analysis, Domain InvarsDawson and Siklossy (1977)REFLECT

Etzioni (1993) STATIC[PRODIGY]

Perez and Etzioni (1992)DYNAMIC (with EBL)[PRODIGY]

Nebel, Koehler, and Dimo-poulos (1997) RIFO

Gerevini and Schubert (1998)DISCOPLAN

Fox and Long (1998, 1999)STAN/ TIM [GRAPHPlan]

(Rintanen 2000)

Smith and Peot(1993) [SNLP]

Gerevini andSchubert (1996)[UCPOP]

Fikes and Nilsson (1972)STRIPS

Minton et al. (1989) PRODIGY

Gratch and DeJong (1992)COMPOSER [PRODIGY]

Bhatnagar and Mostow (1994)FAILSAFE

Borrajo and Veloso (1997)HAMLET (See also multistrategy)

Kambhampati (2000)GRAPHPlan-EBL

Explanation-BasedLearning (EBL)

General ProblemSolving (Chunking)Laird et al. (1987)SOAR

Horn Clause RulesKedar-Cabelli (1987)Prolog-EBG

Symbolic IntegrationMitchell et al. (1986)LEX-2 (See also multi-strategy)

Permissive Real-World Plans

Bennett and DeJong (1996)GRASPER

Chien (1989)

Kambhampati,Katukam, and Qu(1996) UCPOP-EBL

Wolfman and Weld(1999) LPSAT[RELSAT]

Nogood LearningKautz and Selman(1999) BLACKBOX(using RELSAT)

Do and Kambham-pati (2001) GP-CSP[GRAPHPlan]

Analogical

Case-BasedReasoning

Jones and Langley(1995) EUREKA

MicrodomainAnalogy MakerHofstadter andMarshall (1993, 1996)COPYCAT

Conceptual DesignSycara et al. (1992)CADET

Legal Reasoningby AnalogyAshley and McLaren(1995) TRUTH-TELLER

Ashley and Aleven(1997) CATO

CBR DerivationalVeloso and Carbonell (1993)PRODIGY / ANALOGY

Learning VariousAbstraction-Level CasesBergmann and Wilke (1996)PARIS

User-Assisted PlanningAvesani, Perini, and Ricci(2000) CHARADE

CBR TransformationalHammond (1989) CHEF

PRIAR (Kambhampati andHendler 1992)

SPA (Hanks and Weld 1995)

Leake, Kinley, and Wilson(1996) (see also multistrategy)

CBR DerivationalIhrig and Kamb-hampati (1996)[UCPOP]

With EBLIhrig and Kamb-hampati (1997)[UCPOP]

Kakuta et al. (1997)

Table 3A. Analytic Learning Applications and Studies.Studies in heavily shaded blocks concern planners applied to problems beyond classical planning. Implemented system and pro-gram names appear in all caps, and underlying planners and learning subsystems appear in small caps but enclosed in brackets.

Page 9: Learning-Assisted Automated Planning - Subbarao Kambhampati

Articles

SUMMER 2003 81

Planning Applications

Inductive Learning General ApplicationsState Space

(Conjunctive / Disjunctive)Plan Space Compilation

(CSP / SAT / IP)

Learning Operators for Real-World Robotics, ClusteringSchmill, Oates, and Cohen(2000) [TBA for inducingdecision tree]

PropositionalDecision Trees

Concept LearningHunt, Marin, and Stone (1966)CLS

General DT Learning

Quinlan (1986) ID3

Quinlan (1993) C4.5

Khardon (1999) L2ACT

Cohen and Singer (1999)SLIPPER

Real-ValuedNeural Network

Hinton (1989)

Symbolic Rules from NNCraven and Shavlik (1993)

Reflex/ReactivePomerleau (1993) ALVINN

First-Order Logic

Inductive Logic

Programming (ILP)

Hornlike ClausesQuinlan (1990) FOIL

Muggleton and Feng (1990)GOLEM

Lavrac, Dzeroski, and Grobel-nik (1991) LINUS

Leckie and Zukerman (1998)GRASSHOPPER [PRODIGY]

Zelle and Mooney (1993) (Seealso multistrategy)

Reddy and Tadepalli (1999)ExEL

Estlin and Mooney(1996) (See alsomultistrategy)

Huang, Selman, andKautz (2000) (See alsomultistrategy)

Bayesian Learning Train Bayesian Belief Networks,Unobserved VariablesDempster, Laird, and Rubin(1977) EM

Text ClassificationLang (1995) NEWSWEEDER

Predict Run Time of ProblemSolvers for Decision-TheoreticControlHorvitz et al. (2001)

Other InductiveLearning

Action Strategies and Rivest’sDecision List LearningKhardon (1999)

Martin and Geffner (2000)

Plan Rewriting

(2000) PBR

Reinforcement

Learning (RL)

Sutton (1988) TD / TDLAMBDA

Watkins (1989) Q Learning

Barto, Bradtke, and Singh(1995)

Real-Time DynamicProgramming

Dearden, Friedman, andRussel (1998) Bayesian QLearning

(Dietterich and Flann 1995)(See also multistrategy)

Incremental DynamicProgrammingSutton (1991) DYNA

Planning with learned operators:

Garcia-Martinez and Borrajo(2000) LOPE

Ambite, Knoblock, and Minton

Table 3B. Inductive Learning Applications and Studies.CSP = constraint-satisfaction programming. DT = decision tree. IP = integer linear programming. NN = neural network. SAT = satis-fiability. Studies in heavily shaded blocks feature planners applied to problems beyond classical planning. Implemented system andprogram names appear in all caps, and underlying planners and learning subsystems appear in small caps but enclosed in brackets.

Page 10: Learning-Assisted Automated Planning - Subbarao Kambhampati

Articles

82 AI MAGAZINE

Planning ApplicationsMultistrategyLearning General Applications

State SpaceConjunctive/Disjunctive

Plan Space Compilation[ CSP / SAT/ IP ]

Learn / Refine OperatorsCarbonell and Gil (1990),Gil (1994) EXPO[PRODIGY]

Wang (1996a, (1996b)OBSERVER [PRODIGY]

McCluskey, Richardson,and Simpson (2002)OPMAKER

EBL and Induction:Calistri-Yeh, Segre, Sturgill(1996) ALPS

Analytic andInductive

Symbolic IntegrationMitchell, Keller, and Kedar-Cabelli(1986) LEX-2

Learn CSP Variable OrderingZweban et al. (1992) GERRY

Incorporate Symbolic Knowledgein Neural NetworksShavlik and Towell (1989) KBANN

Fu (1989)

Learn Horn Clause SetsFocused by Domain TheoryPazzani, Brunk, and Silverstein(1991) FOCL

Refining Domain TheoriesUsing Empirical DataOurston and Mooney (1994)EITHER

Neural Networks and FuzzyLogic to Implement AnalogyHollatz (1999)

Genetic, Lazy RL,k-Nearest NeighborSheppard and Salzberg (1995)

CBR and InductionLeake, Kinley, and Wilson(1996) DIAL

Borrajo and Veloso (1997)HAMLET [PRODIGY]

Zimmerman andKambhampati (1999, 2002)EGBG, PEGG [GRAPHPLAN]

Deduction, Induction,and GeneticAler, Borrajo, and Isasi(1998) HAMLET-EvoCK[PRODIGY]

Aler and Borrajo (2002)HAMLET-EvoCK[PRODIGY]

Explanation-BasedLearning andNeural Networks

Domain Theory Cast in NeuralNetwork FormMitchell and Thrun (1995) EBNN

Explanation-BasedLearning andInductive LogicProgramming

Search Control for Logic ProgramsCohen (1990) AxA-EBL

Zelle and Mooney (1993)DOLPHIN [FOIL/PRODIGY]

Zelle and Mooney (1993)DOLPHIN [PRODIGY/FOIL]

Estlin and Mooney(1996) SCOPE[FOIL]

EBL, ILP, and SomeStatic AnalysisHuang, Selman, andKautz (2000)[BLACKBOX-FOIL]

Explanation-BasedLearning andReinforcementLearning

Dietterich and Flann (1997)EBRL Policies

Table 3C. Multistrategy Learning Applications and Studies.CSP = constraint-satisfaction programming. DT = decision tree. EBL = explanation-based learning. IP = integer linear programming. NN =neural network. RL = reinforcement learning. SAT: satisfiability. Studies in the heavily shaded blocks feature planners applied to problemsbeyond classical planning. Implemented system and program names appear in all caps, and underlying planners and learning subsystemsappear in small caps but enclosed in brackets.

studies and implementations of the learningtechnique in the first column. These GeneralApplications were deemed particularly rele-vant to planning, and of course, the list ishighly abridged. Comparing the General Ap-

The table rows feature the major learningtypes outlined in tables 1 and 2, occasionallyfurther subdivided as indicated in the leftmostcolumn. The second column contains a listingof some of the more important nonplanning

Page 11: Learning-Assisted Automated Planning - Subbarao Kambhampati

plications column with the Planning columnsfor each table provides a sense of which ma-chine learning methods have been appliedwithin the planning community. The threecolumns making up the Planning Applicationspartition subdivide the applications into statespace; plan space; and CSP, SAT, and IP plan-ning. Studies dealing with planning problemsbeyond classical planning (as defined in Plan-ning Problem Type earlier) appear in shadedblocks in these tables.

Table 3C, covering multistrategy learning,reflects the fact that the particular combina-tion of techniques used in some studies couldnot always be easily subcategorized relative tothe analytic and inductive approaches of tables3A and 3B. This is often the case, for example,with an inductive learning implementationthat exploits the design of a particular plan-ning system. Examples include HAMLET (Borrajoand Veloso 1997), which exploits the searchtree produced by the PRODIGY 4.0 planning sys-tem to lazily learn search control heuristics,and EGBG and PEGG (Zimmerman and Kamb-hampati 2002, 1999), which exploit GRAPH-PLAN’s use of the planning graph structure tolearn to shortcut the iterative search episodes.Studies such as these appear in table 3c underthe broader category, analytic and inductive.

In addition to classifying the studies sur-veyed along the learning-type and planning-type dimensions, these tables illustrate severalfoci of this corpus of work. For example, thepreponderance of research in analytic learningas it applies to planning rather than inductivelearning styles is apparent, as is the heavyweighting in the area of state-space planning.We return to such issues when discussing im-plications for future research in the final sec-tion.

Survey Tables Based on All Five DimensionsThe same studies appearing in tables 3A, 3B,and 3C are tabulated in tables 4A and 4B ac-cording to all five dimensions in figure 1. Wehave used a block structure within the tables toemphasize shared attribute values whereverpossible, given the left-to-right ordering of thedimensions. Here, the two dimensions not rep-resented in the previous set of tables, “Plan-ning-Learning Goal” and “Learning Phase,” areordered first, so this block structure reveals themost about the distribution of work across at-tributes in these dimensions. It’s apparent thatthe major focus of learning-in-planning workhas been on speedup, with much less attentiongiven to the aspects of learning to improveplan quality or building and improving the do-

main theory. Also obvious is the extent towhich research has focused on learning priorto or during planning, with scant attentionpaid to learning during plan execution.

Graphic Analysis of SurveyThere are obvious limitations to what can read-ily be gleaned from any tabular presentation ofa data set across more than two or three dimen-sions. To more easily visualize patterns and re-lationships in learning-in-planning work, wehave devised a graphic method of depictingthe corpus of work in this survey with respectto the five dimensions given in figure 1. Figure2 illustrates this method of depiction by map-ping two studies from the survey onto a ver-sion of figure 1.

In this manner, every study or project cov-ered in the survey has been mapped onto atleast one 5-node, directed subgraph of figure 3(classical planning systems) or figure 4 (sys-tems designed to handle problems beyond theclassical paradigm). The edges express whichcombinations of the figure 1 dimensional at-tributes were actually realized in a system cov-ered by the survey.

Besides providing a visual characterizationof the corpus of research in learning in plan-ning, this graphic presentation mode permitsquick identification of all planner-learning sys-tem configurations that embody any of the as-pects of the five dimensions (nodes). For exam-ple, because the survey tables don’t show allpossible values in each dimension’s range, as-pects of learning in planning that have re-ceived scant attention are not obvious untilone glances at the graphs, which entails simplyobserving the edges incident on any givennode. Admittedly, a disadvantage of this pre-sentation mode is that the specific planningsystem associated with a given subgraph can-not be extracted from the figure alone. Howev-er, the tables can assist in this regard.

Learning within the Classical PlanningFramework Figure 3 indicates with dashedlines and fading those aspects (nodes) of thefive dimensions of learning in planning thatare not relevant to classical planning. Specifi-cally, Learning or Improving the Domain The-ory is inconsistent with the classical planningassumption of a complete and correct domaintheory. Similarly, the strength of reinforcementlearning lies in its ability to handle stochasticenvironments in which the domain theory iseither unknown or incomplete. (Dynamic pro-gramming, a close cousin to reinforcementlearning methods, requires a complete and per-fect domain theory, but because of efficiencyconsiderations, it has remained primarily of

Articles

SUMMER 2003 83

Page 12: Learning-Assisted Automated Planning - Subbarao Kambhampati

Articles

84 AI MAGAZINE

Dimensions

Planning/Learning Goal

Learning Phase Type of Learning PlanningApproach

Planning Systems / Studies

Plan space Smith and Peot (1993) [SNLP]

Gerevini and Schubert (1996) [UCPOP]

Analytic

Static analysis

.

.

State space

Etzioni (1993) STATIC [PRODIGY]

Dawson and Siklossy (1977) REFLECT

Nebel, Koehler, and Dimopoulos (1997) RIFO

Fox and Long (1998, 1999), Rintanen (2000)STAN / TIM [GrRAPHPLAN]

.

.

Before planningstarts

.

.

.

.Static analysis:

Learn abstractions

.

.

Sacerdoti (1974) ABSTRIPS

Knoblock (1990) ALPINE [PRODIGY]

Before and duringplanning

Static analysis andEBL

. Perez and Etzioni (1992) DYNAMIC[PRODIGY]

.

.

State space

.

.

Fikes and Nilsson (1972) STRIPS

Minton (1989) PRODIGY/EBL

Gratch and DeJong (1992) COMPOSER[PRODIGY]

Bhatnagar (1994) FAILSAFE

Kambhampati (2000) GRAPHPLAN-EBL

Plan space Chein (1989)

Kambhampati, Katukam, and Qu (1996)UCPOP-EBL

(Compilation)

SAT

Nogood LearningKautz and Selman (1999) BLACKBOX

LP & SAT Wolfman and Weld (1999) LPSAT [RELSAT]

Analytic

.

.

.

EBL

.

.

.

.

.

CSP Nogood LearningDo and Kambhampati (2001) GP-CSP[GRAPHPLAN]

.

.

.

State space

.

.

.

.

.

Learning Various Abstraction-Level CasesBergmann and Wilke (1996) PARIS

User Assist PlanningAvesani, Perini, and Ricci (2000) CHARADE

Transformational Analogy / AdaptationHammond (1989) CHEF

Kambhampati and Hendler (1992) PRIAR

Hanks and Weld (1995) SPA

Leake, Kinley, and Wilson (1996) DIAL

Derivational Analogy / AdaptationVeloso and Carbonell (1993) PRODIGY /ANALOGY

.

.

.

.

.

.

.

.

.

.

.

Speedup

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

During planning

.

.

.

.

.

.

.

.

.

.

.

.

Analytic

Analogical

Case-BasedReasoning

Plan space Ihrig and Kambhampati (1996) [UCPOP]

With EBLIhrig and Kambhampati (1997) [UCPOP]

Table 4A. Survey Studies Mapped across All Five Dimensions, Part 1.CSP = constraint-satisfaction programming. EBL = explanation-based learning. LP = linear programming. SAT = satisfiability. Stud-ies in heavily shaded blocks feature planners applied to problems beyond classical planning. Implemented system and programnames appear in all caps, and underlying planners and learning subsystems appear in small caps but enclosed in brackets.

Page 13: Learning-Assisted Automated Planning - Subbarao Kambhampati

Articles

SUMMER 2003 85

Dimensions

Planning /Learning Goal Learning Phase Type of Learning Planning

Approach

Planning Systems / Studies

Inductive:

Inductive logicprogramming (ILP)

.

State space

Leckie and Zuckerman (1998)GRASSHOPPER [PRODIGY]

Reddy and Tadepalli (1999) ExEL

Other induction

.

.

.

Action Strategies and Rivest’sDecision List Learning of PoliciesKhardon (1999)Martin and Geffner (2000)

Calistri-Yeh, Segre, and Sturgill (1996)ALPS

CBR and InductionLeake, Kinley, and Wilson (1996) DIAL

.

.

.

.

.

.

During planning

.

.

.

Multistrategy

Analytic andinductive

.

.

State space

.

.Zimmerman and Kambhampati (1999)EGBG [Graphplan]

Before andduring planning

EBL, ILP,and static analysis

(Compilation)

SAT

Huang, Selman, and Kautz (2000)[BLACKBOX/FOIL]

State space

Zelle and Mooney (1993) DOLPHIN[PRODIGY/FOIL]

.

.

.

.

.

.

.

.

.

Speedup

.

.

.

.

.

.

.

.

.

.

.

.

EBL and ILPPlan space Estlin and Mooney (1996) SCOPE [FOIL]

EBL and

inductive

.

.

.

State space

.

Borrajo and Veloso (1997) HAMLET[PRODIGY]

Deductive-Inductive and GeneticHAMLET-EvoCK (PRODIGY) (Aler andBorrajo 1998, 2002)

Zimmerman and Kambhampati (2002)PEGG [GRAPHPLAN]

Speedup

and

improve planquality

.

.

During planning

.

.

.

.Inductive

(Analysis of plandifferences)

.

.

Plan RewritingAmbite, Knoblock, and Minton (2000) PbR

Before planningstarts

(Propositional)

decision trees

.

State space

Learning Operators for Real World Robotics,ClusteringSchmill, Oates, and Cohen (2000) TBA forinducing decision tree

Analytic:

EBL

.

.

Permissive Real-World PlansBennett and DeJong (1996) GRASPER

Learn or improvedomain theory

During planexecution

Multistrategy:

Analytic andinductive

.

.

.

Learning / Refining OperatorsWang (1996a, 1996b) OBSERVER[PRODIGY]

McClusky, Richardson, and Simpson(2002) OPMAKER

Carbonell and Gil (1990); Gil (1994) EXPO[PRODIGY]

EBL and RL State space EBRL Dietterich and Flann (1997)Learn or improvedomain theory

and

improve planquality

.

.

During planning

.

.

Inductive:

Reinforcement

learning

.

.

.

Incremental Dynamic ProgrammingSutton (1991) DYNA

Planning with Learned OperatorsGarcia-Martinez and Borrajo (2000) LOPE

Table 4B. Survey Studies Mapped across All Five Dimensions, Part 2.EBL = explanation-based learning. ILP = inductive logic programming. RL = reinforcement learning. SAT = satisfiability. Studiesin heavily shaded blocks feature planners applied to problems beyond classical planning. Implemented system and programnames appear in small caps, and underlying planners and learning subsystems appear in small caps but enclosed in brackets.

Page 14: Learning-Assisted Automated Planning - Subbarao Kambhampati

ning. Not surprisingly, learning in the thirdphase, during plan execution, is not a focus forclassical planning scenarios because this modehas clear affinity with improving a faulty do-main theory—a nonclassical problem.

It is apparent, based on the figure 3 graph incombination with the survey tables, that ex-planation-based learning (EBL) has been exten-sively studied and applied to every planningapproach and both relevant planning-learninggoals. This is perhaps not surprising given thatplanning presumes the sort of domain theorythat EBL can readily exploit. Perhaps more no-

theoretical interest with respect to classicalplanning.)

Broadly, the figure indicates that some formof learning has been implemented with allplanning approaches. If we consider the Learn-ing Phase dimension of figure 3, it is obviousthat the vast majority of the work to date hasfocused on learning conducted during theplanning process. Work in automatic extrac-tion of domain-specific knowledge throughanalysis of the domain theory (Fox and Long1999, 1998; Gerevini and Schubert 1998) con-stitutes the learning conducted before plan-

Articles

86 AI MAGAZINE

SAT

CSP

LP

Plan Space Search

State Space Search[Conjunctive /

Disjunctive]Classical Planning � Static world � Deterministic � Fully observable � Instantaneous Actions � Propositional

Full-Scope Planning � Dynamic world � Stochastic � Partially observable� Durative actions � Asynchronous Goals � Metric/Continuous

CompilationApproaches

Explanation-BasedLearning

Decision Tree

Inductive LogicProgramming

Neural Network

Bayesian Learning

Other Induction

Analytic and Induction

EBL and InductiveLogic Programming

EBL and Reinforce-ment Learning

Multistrategy

Inductive

Analytic

Static Analysisand Abstractions

ReinforcementLearning

Before PlanningStarts

During PlanningProcess

During PlanExecution

Learning AspectsPlanning Aspects

ProblemType

Planning Approach

Planning-Learning

Goal

Learning PhaseType of Learning

Learnor Improve

DomainTheory

ImprovePlan

Quality

SpeedUp

Planning

Beyond Classical Modes

Analogic

Derivational Analogy/ Case-Based

Figure 2. Example Graphic Mapping of Two Learning-in-Planning Systems.EBL = explanation-based learning. ILP = inductive logic programming. The layout of the five dimensions in figure 1 and their range of valuescan be used to map the research work covered in the survey tables. By way of example, the PRODIGY-EBL system (Minton et al. 1989) is rep-resented by the top connected series of gray lines; it’s a classical planning system that conducts state-space search, and it aims to speed upplanning using explanation-based learning (EBL) during the planning process. Tracing the subgraph from the left, the edge picks up theline thickness at the State-Space node and the gray shade of the Speed-Up Planning node. The SCOPE system (Estlin and Mooney 1996) isthen represented as the branched series of thicker (2 pt) lines. SCOPE is a classical planning system that conducts plan-space search, and thegoal of its learning subsystem is to both speed up planning and improve plan quality. Thus, the Plan-Space node branches to both Planning-Learning Goal nodes. All SCOPE’s learning occurs during the planning process, using both EBL and inductive logic programming (ILP). Assuch, the edges converge at the during planning process node, but both edges persist to connect with the EBL and ILP node.

Page 15: Learning-Assisted Automated Planning - Subbarao Kambhampati

table is the scant attention paid to inductivelearning techniques for classical planners. Al-though ILP has extensively been applied as alearning tool for planners, other inductivetechniques such as decision tree learning,neural networks, and Bayesian learning, haveseen few planning applications.

Learning within a Nonclassical PlanningFramework Figure 4 covers planning systemsdesigned to learn in the wide range of problemclasses beyond the classical formulation(shown in shaded blocks in tables 3A, 3B, and3C and 4A and 4B). There are, as yet, far fewersuch learning-augmented systems, althoughthis area of planning community interest isgrowing. Those “beyond classical planning”systems that exist extend the classical planningproblem in a variety of different ways, but be-cause of space considerations, we have not re-flected these variations with separate versions

of figure 4 for each combination. Learning in adynamic, stochastic world is the natural do-main of reinforcement learning systems, andas discussed earlier, this popular machinelearning field does not so readily fit our five-di-mensional learning-in-planning perspective.Figure 4 therefore represents reinforcementlearning in a different manner than the otherapproaches; a single shade, brick crosshatch setof edges is used to span the five dimensions.The great majority of reinforcement learningsystems to date adopt a state-space perspective,so there is an edge skirting this node. With re-spect to the planning-learning goal dimension,reinforcement learning can be viewed as both“improving plan quality” (the process movestoward the optimal policy) and “learning thedomain theory” (it begins without a model oftransition probability between states). Thisview is reflected in figure 4 as the vertical rein-

Articles

SUMMER 2003 87

SAT

CSP

LP

Plan Space Search

State Space Search[Conjunctive /

Disjunctive]Classical Planning � Static world � Deterministic � Fully observable � Instantaneous Actions � Propositional

Full-Scope Planning � Dynamic world � Stochastic � Partially observable� Durative actions � Asynchronous Goals � Metric/Continuous

CompilationApproaches

Explanation-BasedLearning

Case-Based Reasoning(Derivational/Trans-

formational Analogy)

Decision Tree

Inductive LogicProgramming

Neural Network

Bayesian Learning

Other Induction

Analytic and Induction

EBL and InductiveLogic Programming

EBL and Reinforce-ment Learning

Multistrategy

Inductive

Analytic

Static Analysisand Abstractions

ReinforcementLearning

Before PlanningStarts

During PlanningProcess

During PlanExecution

Learning AspectsPlanning Aspects

ProblemType

Planning Approach

Planning-Learning

Goal

Learning PhaseType of Learning

Learnor Improve

DomainTheory

ImprovePlan

Quality

SpeedUp

Planning

Beyond Classical Modes

Analogic

Figure 3. Mapping of the Survey Planning-Learning Systems for Classical Planning Problems on the Figure 1 Characterization Structure.

Page 16: Learning-Assisted Automated Planning - Subbarao Kambhampati

Where to for Learning in Automated Planning?

We organize this discussion of promising direc-tions for future work in this field along twobroad partitions: (1) apparent gaps in the cor-pus of learning-in-planning research as sug-gested by the survey tables and figures of thisreport and (2) recent advances in planning thatsuggest a role for learning notably beyond themodes investigated by existing studies.

Research Gaps Suggested by the SurveyThere are significant biases apparent in the fo-cus and distribution of the survey studies rela-tive to the five dimensions we have defined. Toan extent, these biases are to be expected be-cause some configurations of planning-learn-ing methods are intrinsically infeasible orpoorly matched (for example, learning domaintheory in a classical planning context or com-

forcement learning edge spanning these nodesunder the planning-learning goal dimension.Finally, because reinforcement learning is bothrooted in interacting with its environment andtakes place during the process of building theplan, there is a vertical edge spanning thesenodes under the learning-phase dimension.

Beyond reinforcement learning systems, fig-ure 4 suggests at least three aspects to the learn-ing-in-planning work done to date for nonclas-sical planning problems, all fielded systemsplan using state-space search, most systemsconduct learning during the plan executionphase, and EBL is again the learning method ofchoice. It is also notable that the only decisiontree learning conducted in any planner is basedin a nonclassical planning system.

With this overview of where we have beenwith learning in planning, we next turn our at-tention to open issues and research directionsthat beckon.

Articles

88 AI MAGAZINE

ImprovePlan

Quality

Learn orImproveDomainTheory

Explanation-BasedLearning

Case Based Reasoning

Decision Tree

Inductive LogicProgramming

Neural Network

Bayesian Learning

Reinforcement Learning

Analytic and Induction

EBL and InductiveLogic Programming

EBL and Reinforce-ment Learning

Multistrategy

Inductive

Analytic

Static Analysisand Abstractions

Before PlanningStarts

During PlanningProcess

During PlanExecution

Speedup

planning

SAT

CSP

L P

Plan Space Search

State Space Search[Conjunctive /

Disjunctive]

CompilationApproaches

Classical Planning � Static world � Deterministic � Fully observable � …

Full-Scope Planning � Dynamic world � Stochastic � Partially observable � Durative Actions � Asynchronous Goals � Metric/Continuous

Problem Type Planning Approach

Planning-Learning

Goal

Learning Phase

Type of Learning

Learning AspectsPlanning Aspects

Beyond ClassicalPlanning � Dynamic world � Stochastic � …

Figure 4. Mapping of the Survey Planning-Learning Systems for Beyond-Classical Planning Problems on the Figure 1 Characterization Structure.

Page 17: Learning-Assisted Automated Planning - Subbarao Kambhampati

bining reinforcement learning with SAT, whichdoes not capture the concept of a state). In as-sessing the survey tables here, however, weseek learning-in-planning configurations thatare feasible, have been largely ignored, and ap-pear to hold promise.

Nonanalytic Learning Techniques Thesurvey tables suggest a considerable bias to-ward analytic learning in planning, which de-serves to be questioned. Why is analytic learn-ing so favored? In a sense, a planner using EBLis learning guaranteed knowledge, control infor-mation that is provably correct. However, it iswell known within the machine learning com-munity that approximately correct knowledgecan be at least as useful, particularly if we’recareful not to sacrifice completeness. Given thepresence of a high-level domain theory, it isreasonable to exploit it to learn. However, largeconstraints are placed on just what can belearned if the planner doesn’t also take advan-tage of the full planning search experience.The tables and figures of this study indicate theextent to which ILP has been used in this spirittogether with EBL. This is a logical marriage oftwo mature methodologies; ILP in particularhas powerful engines for inducing logical ex-pressions, such as FOIL (Quinlan 1990), that canreadily be employed. It is curious to note, how-ever, that decision tree learning has been usedin only one study in this entire survey, yet thisinductive technique is at least as mature andfeatures its own very effective engines such asID3 and C4.5 (Quinlan 1993, 1986). In the1980s, decision tree algorithms were generallynot considered expressive enough to capturecomplex target concepts (such as under whatconditions to apply an operator). However, giv-en subsequent evolutions in both decision treemethods and the current opportunities forlearning to assist the latest generation of plan-ners, the potential of decision tree learning inplanning merits reconsideration.

Learning across Problems A learning as-pect that has largely fallen out of favor in re-cent years is the compilation and retention ofsearch guidance that can be used across differ-ent problems and perhaps even different do-mains. One of the earliest implementations ofthis took the form of learning search controlrules (for example, using EBL). There might betwo culprits that led to disenchantment withlearning this interproblem search control:

First is the utility problem that can surfacewhen too many, or relatively ineffective rulesare learned.

Second is the propositionalization of the plan-ning problem, wherein lifted representationsof the domain theory were forsaken for the

Articles

SUMMER 2003 89

faster processing of grounded versions involv-ing only propositions. The cost of rule check-ing and matching in more recent systems thatuse grounded operators is much lower than forplanning systems that handle uninstantiatedvariables.

Not conceding these hurdles to be insur-mountable, we suggest the following researchapproaches:

One trade-off associated with a move toplanning with grounded operators is the loss ofgenerality in the basic precepts that are mostreadily learned. For example, GRAPHPLAN canlearn a great number of “no goods” duringsearch on a given problem, but in their basicform, they are only relevant to the given prob-lem. GRAPHPLAN retains no interproblem mem-ory. It is worth considering what might consti-tute effective interproblem learning for such asystem.

The rule utility issue faced by analytic learn-ing systems (and possibly all systems that learnsearch control rules) can be viewed as the prob-lem of incurring the cost of a large set of sound,exact, and probably overspecific rules. Learn-ing systems that can reasonably relax thesoundness criterion for learned rules can movebroadly toward a problem goal using generallycorrect search control. Some of the multistrat-egy studies reflected in table 3C are relevant tothis view to the extent that they attempt toleverage the strengths of both analytic and in-ductive learning techniques to acquire moreuseful rules. Initial work with an approach thatdoes not directly depend on a large set of train-ing examples was reported in Kambhampati(1999). Here, a system is described that seeks tolearn approximately correct rules by relaxingthe constraint of the UCPOP-EBL system that re-quires regressed failure explanations from allbranches of a search subtree before a searchcontrol rule is constructed.

Perhaps the most ambitious approach tolearning across problems would be to extendsome of the work being done in analogical rea-soning elsewhere in AI to the planning field.The goal is to exploit any similarity betweenproblems to speed up solution finding. Currentcase-based reasoning implementations in plan-ning are capable of recognizing a narrow rangeof similarities between an archived partial planand the current state the planner is workingfrom. Such systems cannot apply knowledgelearned in one logistics domain, for example,to another system—even though a humanwould find it natural to use what he/she haslearned in solving an AIPS planning competi-tion driver log problem to a depot problem. Wenote that transproblem learning has been ap-

Page 18: Learning-Assisted Automated Planning - Subbarao Kambhampati

proached from a somewhat different directionin Fox and Long (1999) using a process of iden-tifying abstract types during domain prepro-cessing.

Extending Learning to NonclassicalPlanning Problems The preponderance ofplanning research has been based in classicalplanning, as is borne out by the survey tablesand figures. Historically, this weighting arosebecause of the need to study a less dauntingproblem than full-scope planning, and muchof the progress realized in classical planninghas indeed provided the foundation for ad-vances now being made in nonclassical formu-lations. It is a reasonable expectation that thebody of work in learning methods adapted toclassical planning will similarly be modifiedand extended to nonclassical planning sys-tems. With the notable exception of reinforce-ment learning, the surface has scarcely beenscratched in this regard.

If, as we suggest in the introduction, the re-cent striking advances in speed for state-of-the-art planning systems lies behind the relativepaucity of current research in speedup learn-ing, the focus might soon shift back in this di-rection. These systems, impressive though theyare, demonstrated their speedup abilities inclassical planning domains. As the research at-tention shifts to problems beyond the classicalparadigm, the greatly increased difficulty ofthe problems themselves seems likely to renewplanning community interest in speed-uplearning approaches.

New Avenues for Learning in Planning Motivated by Recent Developments in PlanningRecent advances in planning research suggestseveral aspects of the new generation of plan-ners for which machine learning methodsmight provide important enhancements. Wediscuss here three such avenues for learning inplanning: (1) offline learning of domainknowledge, (2) learning to improve heuristics,and (3) learning to improve plan quality.

Offline Learning of Domain KnowledgeWe have previously noted the high overheadcost of conducting learning online during thecourse of solving a single problem, relative tooften-short solution times for the current gen-eration of fast and efficient planners. Thishandicap might help explain more recent in-terest in offline learning, such as domainanalysis, which can be reused to advantageover a series of problems within a given do-main. The survey results and figure 3 also sug-gest an area of investigation that has so farbeen neglected in studies focused on nonclas-

Articles

90 AI MAGAZINE

ReinforcementLearning

The Special Case

In the context of the figure 1 dimensions for a learning-in-planning system, reinforcement learning must be seen as aspecial case. Unlike the other learning types, this widelystudied machine learning field is not readily characterized asa learning technique for augmenting a planning system. Es-sentially, it’s a toss up whether to view reinforcement learn-ing as a learning system that contains a planning subsystemwith a learning component. Reinforcement learning is de-fined more clearly by characterizing a learning problem in-stead of a learning technique.

A general reinforcement learning problem can be seen ascomposed of just three elements: (1) goals an agent mustachieve, (2) an observable environment, and (3) actions anagent can take to affect the environment (Sutton and Barto1998). Through trial-and-error online visitation of states inits environment, such a reinforcement learning system seeksto find an optimal policy for achieving the problem goals.When reinforcement learning is applied to a planning prob-lem, a fourth element, the presence of a domain theory,comes into play. The explicit model of the valid operators isused to direct the exploration of the state space, and thisspace is used (together with the reward associated with eachstate), in turn, to refine the domain theory. Because, in prin-ciple, the “exact domain theory” is never acquired, reinforce-ment learning has been termed a “lifelong learning process.”This aspect stands in sharp contrast to the assumption inclassical planning that the planner is provided a completeand perfect domain theory.

Because of the tightly integrated nature of the planningand learning aspects of reinforcement learning, the five-di-mensional view of figure 1 is not as useful for characterizingimplemented reinforcement learning-planning systems as itis for other learning-augmented planners. Nonetheless,when we analyze the survey results in the next section, wewill map planning-oriented reinforcement learning work on-to this dimensional structure for purposes of comparisonwith the other nine learning techniques that have been (orcould be) used to augment planning systems.

Page 19: Learning-Assisted Automated Planning - Subbarao Kambhampati

sical planning—the learning of domain invari-ants before planning starts. This static analysishas been shown to be an effective speedup ap-proach for many classical planning domains,and there is no reason to believe it cannot sim-ilarly boost nonclassical planning.

On another front, there has been much en-thusiasm in parts of the planning communityfor applying domain-specific knowledge tospeed up a given planner (for example, TL PLAN

[Bacchus and Kabanza 2000] and BLACKBOX

[Kautz and Selman 1998]). This advantage hasalso been realized in hierarchical task network(HTN) planning systems by supplying domain-specific task-reduction schemas to the planner(SHOP [Nau et al. 1999]). Such leveraging ofuser-supplied domain knowledge has beenshown to greatly decrease planning time for avariety of domains and problems. One draw-back of this approach is the burden it places onthe user to correctly hand code the domainknowledge ahead of time and in a form usableby the particular planner. Offline learningtechniques might be exploited here. If the userprovides very high-level domain knowledge ina format readily understandable by humans,the system could learn in supervised fashion tooperationalize this background knowledge tothe particular formal representation usable bya given target planning system. If the user isnot to be burdened with learning the planner’slow-level language for knowledge representa-tion, this approach might entail solving sam-ple problems iteratively with combinations ofthese domain rules to determine both correct-ness and efficacy.

An interesting related issue is the question ofwhich types of knowledge are easiest and hard-est to learn, which has a direct impact on thetypes of knowledge that might actually beworth learning. The closely related machinelearning aspect of sample complexity addressesthe number and type of examples that areneeded to induce a given concept or targetfunction. To date, the relative difficulty oflearning tasks has received little attention withrespect to the domain-specific knowledge usedby some planners. What are the differences interms of the sample complexity of learning dif-ferent types of domain-specific control knowl-edge? For example, it would be worth catego-rizing the TL PLAN control rules versus theSHOP/HTN–style schemas in terms of their sam-ple complexity.

Learning to Improve Heuristics Thecredit for both the revival of plan-space plan-ning and the impressive performance of moststate-space planners in recent years goes largelyto the development of heuristics that guide the

planner at key decision points in its search. Assuch, considerable research effort is focusingon finding more effective domain-indepen-dent heuristics and tuning heuristics to partic-ular problems and domains. The role thatlearning might play in acquiring or refiningsuch heuristics has largely been unexplored. Inparticular, learning such heuristics inductivelyduring the planning process would seem tohold promise. Generally, the heuristic valuesare calculated by a linear combination ofweighted terms where the designer choosesboth the terms and their weights in hopes ofobtaining an equation that will be robustacross a variety of problems and domains. Thesearch trace (states visited) resulting from aproblem-solving episode could provide thenegative and positive examples needed to traina neural network or learn a decision tree. Pos-sible target functions for inductively learningor improving heuristics include term weightsthat are most likely to lead to higher-qualitysolutions for a given domain, term weightsthat will be most robust across many domains,attributes that are most useful for classifyingstates, exceptions to an existing heuristic suchas used in LRTA* (Korf 1990), and a metalevelfunction that selects or modifies a searchheuristic based on the problem or domain.

Multistrategy learning might also play a rolein that the user might provide backgroundknowledge in the form of the base heuristic.

The ever-growing cadre of planning ap-proaches and learning tools, each with theirown strengths and weaknesses, suggests an-other inviting direction for speedup learning.Learning a rule set or heuristic that will directthe application of the most effective approach(or multiple approaches) for a given problemcould lead to a metaplanning system with ca-pabilities well beyond any individual planner.Interesting steps in this direction have beentaken by Horvitz et al. (2001) using the con-struction and use of Bayesian models to predictthe run time of various problem solvers.

Learning to Improve Plan Quality Thesurvey tables and figures suggest that the issueof improving plan quality using learning hasreceived much less attention in the planningcommunity than speedup learning. However,because planning systems are ported into real-world applications, this concern is likely to bea primary one. Many planning systems thatsuccessfully advance into the marketplace willneed to interact frequently with human usersin ways that have received scant attention inthe lab. Such users are likely to have individualbiases with respect to plan quality that theycan be hard pressed to quantify. These plan-

Articles

SUMMER 2003 91

Page 20: Learning-Assisted Automated Planning - Subbarao Kambhampati

might be that the best way to tailor an interac-tive planner will be in the manner of the pro-gramming-by-demonstration systems thathave recently received attention in the ma-chine learning community (Lau, Domingos,and Weld 2000). Such a system implementedon top of a planner might entail having theuser create plans for several problems that thelearning system would then parse to learn planaspects peculiar to the particular user.

Summary and ConclusionWe have presented the results of an extensivesurvey of research conducted and publishedsince the first application of learning to auto-mated planning was implemented some 30years ago. In addition to compiling categorizedtables of the corpus of work, we have presenteda five-dimensional characterization of learningin planning and mapped the studies onto it.This process has clarified the foci of the work inthis area and suggested a number of avenuesalong which the community might reasonablyproceed in the future. It is apparent that auto-mated planning and machine learning arewell-matched methodologies in a variety ofconfigurations, and we suggest there are anumber of these approaches that merit moreresearch attention than they have received todate. We have expanded on several of thesepossibilities and offered our conjectures aboutwhere the more interesting work might lie.

ReferencesAler, R., and Borrajo, D. 2002. On Control Knowl-edge Acquisition by Exploiting Human-ComputerInteraction. Paper presented at the Sixth Internation-al Conference on Artificial Intelligence Planning andScheduling. 23–27 April, Toulouse, France.

Aler, R.; Borrajo, D.; and Isasi, P. 1998. Genetic Pro-gramming and Deductive-Inductive Learning: AMultistrategy Approach. Paper presented at the Fif-teenth International Conference on Machine Learn-ing, ICML ’98, 24–27 July, Madison, Wisconsin.

Ambite, J. L.; Knoblock, C. A.; and Minton, S. 2000.Learning Plan Rewriting Rules. Paper presented atthe Fifth International Conference on Artificial Intel-ligence Planning and Scheduling, 14–17 April,Breckenridge, Colorado.

Ashley, K., and Aleven, V. 1997. Reasoning Symboli-cally about Partially Matched Cases. In Proceedingsof the Fifteenth International Joint Conference onArtificial Intelligence, IJCAI-97, 335–341. MenloPark, Calif.: International Joint Conferences on Arti-ficial Intelligence.

Avesani, P.; Perini, A.; and Ricci, F. 2000. InteractiveCase-Based Planning for Forest Fire Management.Applied Intelligence 13(1): 41–57.

Barto, A.; Bradtke, S.; and Singh, S. 1995. Learning toAct Using Real-Time Dynamic Programming. Artifi-

ning systems could be charted in a 2D spacewith the following two axes: (1) degree of cov-erage of the issues confronted in a real-worldproblem, that is, the capability of the system todeal with all aspects of a problem without ab-stracting them away and (2) degree of automa-tion, that is, the extent to which the system au-tomatically reasons about the various problemaspects and makes decisions without guidanceby the human user.

Figure 5 shows such a chart for current-dayplanning systems. The ideal planner plotted onthis chart would obviously lie in the top-rightcorner. It is interesting to note that mostusers—aware that they can’t have it all—prefera system that can, in some sense, handle mostaspects of the real-world problem at the ex-pense of full automation. However, most cur-rent-day planning systems abstract away largeportions of the real-world problem in favor offully automating what the planner can actuallyaccomplish. In large practical planning envi-ronments, fully automated planning is neitherfeasible nor desirable because users want to ob-serve and control plan generation.

Some planning systems such as HICAP

(Munoz-Avila et al. 1999) and ALPS (Calistri-Yeh, Segre, and Sturgill 1996) have made in-roads toward building an effective interfacewith their human users. No significant role forlearning has been established yet for such sys-tems, but possibilities include learning userpreferences with respect to plan actions, inter-mediate states, and pathways. Given the hu-man inclination to “have it their way,” it

Articles

92 AI MAGAZINE

Deg

ree

of A

uto

mat

ion

Coverage of Real-World Aspects

User‘sPriority

IDEAL MostCurrentPlanningSystems

Figure 5. The Coverage versus Automation Trade-Off in Planning Systems.

Page 21: Learning-Assisted Automated Planning - Subbarao Kambhampati

cial Intelligence 72(1): 81–138.

Ashley, K. D., and McLaren, B. 1995. Reasoning withReasons in Case-Based Comparisons. In Proceedingsof the First International Conference on Cased-BasedReasoning (ICCBR-95), 133–144. Berlin: Springer.

Bacchus, F., and Kabanza, F. 2000. Using TemporalLogics to Express Search Control Knowledge for Plan-ning. Artificial Intelligence 116(1–2): 123–191.

Bennett, S. W., and DeJong, G. F. 1996. Real-WorldRobotics: Learning to Plan for Robust Execution. Ma-chine Learning 23(2–3): 121–161.

Bergmann, R., and Wilke, W. 1996. On the Role ofAbstractions in Case-Based Reasoning. In Proceedingsof EWCBR-96, the European Conference on Case-BasedReasoning, 28–43. New York: Springer.

Bhatnagar, N., and Mostow, J. 1994. Online Learningfrom Search Failures. Machine Learning 15(1): 69–117.

Blum, A., and Furst, M. L. 1997. Fast Planningthrough Planning Graph Analysis. Artificial Intelli-gence 90(1–2): 281–300.

Borrajo D., and Veloso, M. 1997. Lazy IncrementalLearning of Control Knowledge for Efficiently Ob-taining Quality Plans. Artificial Intelligence Review11(1–5): 371–405.

Bylander, T. 1992. Complexity Results for Serial De-composability. In Proceedings of the Tenth NationalConference on Artificial Intelligence (AAAI-92),729–734. Menlo Park, Calif.: American Associationfor Artificial Intelligence.

Calistri-Yeh, R.; Segre, A.; and Sturgill, D. 1996. ThePeaks and Valleys of ALPS: An Adaptive Learning andPlanning System for Transportation Scheduling. Pa-per presented at the Third International Conferenceon Artificial Intelligence Planning Systems (AIPS-96),29–31 May, Edinburgh, United Kingdom.

Carbonell, Y. G., and Gil, Y. 1990. Learning by Exper-imentation: The Operator Refinement Method. InMachine Learning: An Artificial Intelligence Approach,Volume 3, eds. Y. Kodtratoff and R. S. Michalski,191–213. San Francisco, Calif.: Morgan Kaufmann.

Chien, S. A. 1989. Using and Refining Simplifi-cations: Explanation-Based Learning of Plans in In-tractable Domains. In Proceedings of the EleventhInternational Joint Conference on Artificial Intelli-gence, 590–595. Menlo Park, Calif.: InternationalJoint Conferences on Artificial Intelligence.

Cohen, W. W. 1990. Learning Approximate ControlRules of High Utility. Paper presented at the SeventhInternational Conference on Machine Learning,21–23 June, Austin, Texas.

Cohen, W. W., and Singer, Y. 1999. A Simple, Fast,and Effective Rule Learner. In Proceedings of the Six-teenth National Conference on Artificial Intelligence(AAAI-99), 335–342. Menlo Park, Calif.: AmericanAssociation for Artificial Intelligence.

Craven, M., and Shavlik, J. 1993. Learning SymbolicRules Using Artificial Neural Networks. Paper pre-sented at the Tenth International Conference on Ma-chine Learning, 24–27 July, London, United King-dom.

Dawson, C., and Siklossy, L. 1977. The Role of Pre-processing in Problem-Solving Systems. In Proceed-

ings of the Fifth International Joint Conference onArtificial Intelligence, 465–471. Menlo Park, Calif.:International Joint Conferences on Artificial Intelli-gence.

Dearden, R.; Friedman, N.; and Russell, S. 1998.Bayesian Q-Learning. In Proceedings of the FifteenthNational Conference on Artificial Intelligence (AAAI-98), 761–768. Menlo Park, Calif.: American Asso-ciation for Artificial Intelligence.

Dempster, A. P.; Laird, N. M.; and Rubin, D. B. 1977.Maximum Likelihood from Incomplete Data via theEM Algorithm. Journal of the Royal Statistical SocietyB39(1): 1–38.

Dietterich, T. G., and Flann, N. S. 1997. Explanation-Based Learning and Reinforcement Learning: A Uni-fied View. Machine Learning 28:169–210.

Do, B., and Kambhampati, S. 2003. Planning as Con-straint Satisfaction: Solving the Planning Graph byCompiling It into a CSP. Artificial Intelligence 132:151–182.

Estlin, T. A., and Mooney, R. J. 1996. Multi-StrategyLearning of Search Control for Partial-Order Plan-ning. In Proceedings of the Thirteenth National Con-ference on Artificial Intelligence, 843–848. MenloPark, Calif.: American Association for Artificial Intel-ligence.

Etzioni, O. 1993. Acquiring Search-Control Knowl-edge via Static Analysis. Artificial Intelligence 62(2):265–301.

Fikes, R. E., and Nilsson, N .J. 1971. STRIPS: A New Ap-proach to the Application of Theorem Proving toProblem Solving. Artificial Intelligence 2(3–4):189–208.

Fikes, R. E.; Hart, P.; and Nilsson, N. J. 1972. Learningand Executing Generalized Robot Plans. Artificial In-telligence 3:251–288.

Fox, M., and Long, D. 1999. The Detection and Ex-ploitation of Symmetry in Planning Problems. Paperpresented at the Sixteenth International Joint Con-ference on Artificial Intelligence, 31 July–6 August,Stockholm, Sweden.

Fox, M., and Long, D. 1998. The Automatic Inferenceof State Invariants in TIM. Journal of Artificial Intelli-gence Research 9: 317–371.

Fu, L.-M. 1989. Integration of Neural Heuristics intoKnowledge-Based Inference. Connection Science 1(3):325–340.

García-Martínez, R., and Borrajo, D. 2000. An Inte-grated Approach of Learning, Planning, and Execu-tion. Journal of Intelligent and Robotic Systems 29(1):47-78.

Gerevini, A., and Schubert, L. 1998. Inferring StateConstraints for Domain-Independent Planning. InProceedings of the Fifteenth National Conference onArtificial Intelligence, 905–912. Menlo Park, Calif.:American Association for Artificial Intelligence.

Gerevini, A., and Schubert, L. 1996. Accelerating Par-tial-Order Planners: Some Techniques for EffectiveSearch Control and Pruning. Journal of Artificial Intel-ligence Research 5:95–137.

Gil, Y. 1994. Learning by Experimentation: Incre-mental Refinement of Incomplete Planning Do-

Articles

SUMMER 2003 93

Page 22: Learning-Assisted Automated Planning - Subbarao Kambhampati

Kambhampati, S. 2000. Planning Graph as (Dynam-ic) CSP: Exploiting EBL, DDB, and Other CSP Tech-niques in GRAPHPLAN. Journal of Artificial IntelligenceResearch 12:1–34.

Kambhampati, S. 1998. On the Relations between In-telligent Backtracking and Failure-Driven Explana-tion-Based Learning in Planning. Constraint Satisfac-tion and Artificial Intelligence 105(1–2): 161–208.

Kambhampati, S., and Hendler, J. 1992. A ValidationStructure–Based Theory of Plan Modification andReuse. Artificial Intelligence 55(23): 193–258.

Kambhampati, S., and Katukam, Y. Q. 1996. Failure-Driven Dynamic Search Control for Partial OrderPlanners: An Explanation-Based Approach. ArtificialIntelligence 88(1–2): 253–315.

Kautz, H., and Selman, B. 1999. BLACKBOX: UnifyingSAT-Based and Graph-Based Planning. In Proceedingsof the Sixteenth International Joint Conference onArtificial Intelligence (IJCAI-99), 318–325. MenloPark, Calif.: International Joint Conferences on Arti-ficial Intelligence.

Kautz, H., and Selman, B. 1998. The Role of Domain-Specific Knowledge in the Planning as SatisfiabilityFramework. Paper presented at the Fifth Internation-al Conference on Planning and Scheduling (AIPS-98), 7–10 June, Pittsburgh, Pennsylvania.

Kedar-Cabelli, S., and McCarty, T. 1987. Explanation-Based Generalization as Resolution Theorem Prov-ing. In Proceedings of the Fourth International Workshopon Machine Learning, 383–389. San Francisco, Calif.:Morgan Kaufmann.

Khardon, R. 1999. Learning Action Strategies forPlanning Domains. Artificial Intelligence 113(1–2):125–148.

Knoblock, C. 1990. Learning Abstraction Hierarchiesfor Problem Solving. In Proceedings of the EighthNational Conference on Artificial Intelligence,923–928. Menlo Park, Calif.: American Associationfor Artificial Intelligence.

Kodtratoff, Y., and Michalski, R. S., eds. 1990. Ma-chine Learning: An Artificial Intelligence Approach, Vol-ume 3. San Francisco, Calif.: Morgan Kaufmann.

Korf, R. 1990. Real-Time Heuristic Search. ArtificialIntelligence 42(2–3): 189–211.

Laird, J.; Newell, A.; and Rosenbloom, P. 1987. SOAR:An Architecture for General Intelligence. Artificial In-telligence 33(1): 1–64.

Lang, K. 1995. NEWSWEEDER: Learning to Filter Net-news. In Proceedings of the Twelfth International Con-ference on Machine Learning, 331–339. San Francisco,Calif.: Morgan Kaufmann.

Langley, P. 1997. Challenges for the Application ofMachine Learning. In Proceedings of the ICML ‘97Workshop on Machine Learning Application in the RealWorld: Methodological Aspects and Implications, 15–18.San Francisco, Calif.: Morgan Kaufmann.

Lau, T.; Domingos, P.; and Weld, D. 2000. VersionSpace Algebra and Its Application to Programmingby Demonstration. Paper presented at the Seven-teenth International Conference on Machine Learn-ing, 29 June–2 July, Stanford, California.

Lavrac, N.; Dzeroski, S.; and Grobelnik, M. 1991.

mains. Paper presented at the Eleventh InternationalConference on Machine Learning, 10–13 July, NewBrunswick, New Jersey.

Gratch, J., and Dejong, G. 1992. COMPOSER: A Proba-bilistic Solution to the Utility Problem in Speed-UpLearning, In Proceedings of the Tenth National Con-ference on Artificial Intelligence (AAAI-92), 235–240.Menlo Park, Calif.: American Association for Artifi-cial Intelligence.

Hammond, K. 1989. Case-Based Planning: ViewingPlanning as a Memory Task. San Diego, Calif.: Acade-mic Press.

Hanks, S., and Weld, D. 1995. A Domain-Indepen-dent Algorithm for Plan Adaptation. Journal of Artifi-cial Intelligence Research 2:319–360.

Hinton, G. E. 1989. Connectionist Learning Proce-dures. Artificial Intelligence 40(1–3): 185–234.

Hofstadter, D. R., and Marshall, J. B. D. 1993. A Self-Watching Cognitive Architecture of High-Level Per-ception and Analogy-Making. Technical Report,TR100, Center for Research on Concepts and Cogni-tion, Indiana University.

Hofstadter, D. R., and Marshall, J. B. D. 1996. BeyondCopycat: Incorporating Self-Watching into a Com-puter Model of High-Level Perception and AnalogyMaking. Paper presented at the 1996 Midwest Artifi-cial Intelligence and Cognitive Science Conference,26–28 April, Bloomington, Indiana.

Hollatz, J. 1999. Analogy Making in Legal Reasoningwith Neural Networks and Fuzzy Logic. Artificial In-telligence and Law 7(2–3): 289–301.

Horvitz, E.; Ruan, Y.; Gomes, C.; Kautz, H.; Selman,B.; and Chickering, D. M. 2001. A Bayesian Approachto Tackling Hard Computational Problems. Paperpresented at the Seventeenth Conference on Uncer-tainty in Artificial Intelligence, 2–5 August, Seattle,Washington.

Huang, Y.; Kautz, H.; and Selman, B. 2000. LearningDeclarative Control Rules for Constraint-Based Plan-ning. Paper presented at the Seventeenth Interna-tional Conference on Machine Learning, 29 June–2July, Stanford, California.

Hunt, E. B.; Marin, J.; and Stone, P. J. 1966. Experi-ments in Induction. San Diego, Calif.: Academic Press.

Ihrig, L., and Kambhampati, S. 1997. Storing and In-dexing Plan Derivations through Explanation-BasedAnalysis of Retrieval Failures. Journal of Artificial In-telligence 7:161–198.

Ihrig, L., and Kambhampati, S. 1996. Design and Im-plementation of a Replay Framework Based on a Par-tial-Order Planner. In Proceedings of the ThirteenthNational Conference on Artificial Intelligence (AAAI-96). Menlo Park, Calif.: American Association for Ar-tificial Intelligence.

Jones, R., and Langley, P. 1995. Retrieval and Learn-ing in Analogical Problem Solving. In Proceedings ofthe Seventeenth Conference of the Cognitive ScienceSociety, 466–471. Pittsburgh, Pa.: Lawrence Erlbaum.

Kakuta, T.; Haraguchi, M.; Midori-ku, N.; and Okubo,Y. 1997. A Goal-Dependent Abstraction for LegalReasoning by Analogy. Artificial Intelligence and Law5(1–2): 97–118.

Articles

94 AI MAGAZINE

Page 23: Learning-Assisted Automated Planning - Subbarao Kambhampati

Learning Nonrecursive Definitions of Relations withLINUS. In Proceedings of the Fifth European Working Ses-sion on Learning, 265–281. Berlin: Springer.

Leake, D.; Kinley, A.; and Wilson, D. 1996. AcquiringCase Adaptation Knowledge: A Hybrid Approach. InProceedings of the Thirteenth National Conferenceon Artificial Intelligence, 684–689. Menlo Park,Calif.: American Association for Artificial Intelli-gence.

Leckie, C., and Zuckerman, I. 1998. Inductive Learn-ing of Search Control Rules for Planning. Artificial In-telligence 101(1–2): 63–98.

Martin, M., and Geffner, H. 2000. Learning General-ized Policies in Planning Using Concept Languages.In Proceedings of the Seventh International Conferenceon Knowledge Representation and Reasoning (KR 2000),667–677. San Francisco, Calif.: Morgan Kaufmann.

Minton, S., ed. 1993. Machine Learning Methods forPlanning. San Francisco, Calif.: Morgan Kaufmann.

Minton, S.; Carbonell, J.; Knoblock, C.; Kuokka, D.R.; Etzioni, O.; and Gil, Y. 1989. Explanation-BasedLearning: A Problem-Solving Perspective. ArtificialIntelligence 40:63–118.

Mitchell, T. M., and Thrun, S. B. 1995. Learning An-alytically and Inductively. In Mind Matters: A Tributeto Allen Newell (Carnegie Symposia on Cognition), eds.J. D. Steier, T. Mitchell, and A. Newell. New York:Lawrence Erlbaum.

Mitchell, T.; Keller, R.; and Kedar-Cabelli, S. 1986. Ex-planation-Based Generalization: A Unifying View.Machine Learning 1(1): 47–80.

Muggleton, S., and Feng, C. 1990. Efficient Inductionof Logic Programs. Paper presented at the First Con-ference on Algorithmic Learning Theory, 8–10 Octo-ber, Ohmsma, Tokyo, Japan.

Munoz-Avila, H.; Aha, D. W.; Breslow, L.; and Nau, D.1999. HICAP: An Interactive Case-Based Planning Ar-chitecture and Its Application to NoncombatantEvacuation Operations. In Proceedings of the NinthConference on Innovative Applications of ArtificialIntelligence, 879–885. Menlo Park, Calif.: AmericanAssociation for Artificial Intelligence.

Nau, D.; Cao, Y.; Lotem, A.; and Munoz-Avila, H.1999. SHOP: Simple Hierarchical Order Planner. InProceedings of the Sixteenth International JointConference on Artificial Intelligence (IJCAI-99),968–975. Menlo Park, Calif.: International JointConference on Artificial Intelligence.

Nebel, B.; Dimopoulos, Y.; and Koehler, J. 1997. Ig-noring Irrelevant Facts and Operators in Plan Gener-ation. Paper presented at the Fourth European Con-ference on Planning (ECP-97), 24–26 September,Toulouse, France.

Ourston, D., and Mooney, R. 1994. Theory Refine-ment Combining Analytical and Empirical Methods.Artificial Intelligence 66(2): 273–309.

Pazzani, M. J.; Brunk, C. A.; and Silverstein, G. 1991.A Knowledge-Intensive Approach to Learning Rela-tional Concepts. Paper presented at the Eighth Inter-national Workshop on Machine Learning, 27–29June, Evanston, Illinois.

Perez, M., and Etzioni, O. 1992. DYNAMIC: A New Rolefor Training Problems in EBL. Paper presented at the

Ninth International Conference on Machine Learn-ing, 27–30 June, Bled, Slovenia.

Pomerleau, D. A. 1993. Knowledge-Based Training ofArtificial Neural Networks for Autonomous RobotDriving. In Robot Learning, eds. J. Connell and S. Ma-hadevan, 19–43. Boston: Kluwer Academic.

Quinlan, J. R. 1993. C4.5: Programs for Machine Learn-ing. San Francisco, Calif.:. Morgan Kaufmann.

Quinlan, J. R. 1990. Learning Logical Definitionsfrom Relations. Machine Learning 5:239–266.

Quinlan, J. R. 1986. Induction of Decision Trees. Ma-chine Learning 1(1): 81–106.

Reddy, C., and Tadepalli, P. 1999. Learning Horn De-finitions: Theory and an Application to Planning.New Generation Computing 17(1): 77–98.

Rintanen, J. 2000. An Iterative Algorithm for Synthe-sizing Invariants. In Proceedings of the SeventeenthNational Conference on Artificial Intelligence andthe Twelfth Innovative Applications of AI Confer-ence, 806-811. Menlo Park, Calif.: American Associ-ation for Artificial Intelligence.

Sacerdoti, E. 1974. Planning in a Hierarchy of Ab-straction Spaces. Artificial Intelligence 5(2): 115–135.

Samuel, A. L. 1959. Some Studies in Machine Learn-ing Using the Game of Checkers. IBM Journal of Re-search and Development 3(3): 210–229.

Schmill, M.; Oates, T.; and Cohen, P. 2000. LearningPlanning Operators in Real-World, Partially Observ-able Environments. Paper presented at the Fifth Con-ference on Artificial Intelligence Planning Systems(AIPS-2000), 14–17 April, Breckenridge, Colorado.

Shavlik, J. W., and Towell, G. G. 1989. An Approachto Combining Explanation-Based and Neural Learn-ing Algorithms. Connection Science 1(3): 231–253.

Sheppard, J., and Salzberg, S. 1995. Combining Ge-netic Algorithms with Memory-Based Reasoning. Pa-per presented at the Sixth International Conferenceon Genetic Algorithms, 15–19 July, Pittsburgh, Penn-sylvania.

Smith, D., and Peot, M. 1993. Postponing Threats inPartial-Order Planning. In Proceedings of theEleventh National Conference on Artificial Intelli-gence (AAAI-93), 500–506. Menlo Park, Calif.: Amer-ican Association for Artificial Intelligence.

Sutton, R. 1991. Planning by Incremental DynamicProgramming. In Proceedings of the Eighth Internation-al Conference on Machine Learning, 353–357. San Fran-cisco, Calif.: Morgan Kaufmann.

Sutton, R. 1988. Learning to Predict by the Methodsof Temporal Differences. Machine Learning 3(1): 9–44.

Sutton, R., and Barto, G. 1998. Reinforcement Learn-ing—-An Introduction. Cambridge, Mass.: MIT Press.

Sycara, K.; Guttal, R.; Koning, J.; Narasimhan, S.; andNavinchandra, D. 1992. CADET: A Case-Based Synthe-sis Tool for Engineering Design. International Journalof Expert Systems 4(2).

Veloso, M., and Carbonell, J. 1993. DerivationalAnalogy in PRODIGY: Automating Case Acquisition,Storage, and Utilization. Machine Learning 10(3):249–278.

Wang, X. 1996a. A Multistrategy Learning System forPlanning Operator Acquisition. Paper presented at

Articles

SUMMER 2003 95

Page 24: Learning-Assisted Automated Planning - Subbarao Kambhampati

the Third International Workshop on MultistrategyLearning, 23–25 May, Harpers Ferry, West Virginia.

Wang, X. 1996b. Planning While Learning Opera-tors. Paper presented at the Third International Con-ference on Artificial Intelligence Planning Systems(AIPS’96), 29–31 May, Edinburgh, Scotland.

Watkins, C. 1989. Learning from Delayed Rewards.Ph.D. dissertation, Psychology Department, Univer-sity of Cambridge, Cambridge, United Kingdom.

Wolfman, S., and Weld, D. 1999. The LPSAT Engineand Its Application to Resource Planning. In Proceed-ings of the Sixteenth International Joint Conferenceon Artificial Intelligence, 310–317. Menlo Park,Calif.: International Joint Conferences on ArtificialIntelligence.

Zelle, J., and Mooney, R. 1993. Combining FOIL andEBG to Speed Up Logic Programs. In Proceedings ofthe Thirteenth International Joint Conference on Ar-tificial Intelligence, 1106–1111. Menlo Park, Calif.:International Joint Conferences on Artificial Intelli-gence.

Zimmerman, T., and Kambhampati, S. 2002. Gener-ating Parallel Plans Satisfying Multiple Criteria inAnytime Fashion. Paper presented at the Sixth Inter-national Conference on Artificial Intelligence Plan-ning and Scheduling Workshop on Planning andScheduling with Multiple Criteria, 23–27 April,Toulouse, France.

Zimmerman, T., and Kambhampati, S. 1999. Exploit-ing Symmetry in the Planning Graph via Explana-tion-Guided Search. Paper presented at the SixteenthNational Conference on Artificial Intelligence, 18–22July, Orlando, Florida.

Zweben, M.; Davis, E.; Daun, B.; Drascher, E.; Deale,M.; and Eskey, M. 1992. Learning to Improve Con-straint-Based Scheduling. Artificial Intelligence58(1–3): 271–296.

Terry Zimmerman has an M.S. in nuclear scienceand engineering and is in the throes of completinghis PhD. in computer science and engineering at Ari-zona State University. His dissertation study in thearea of automated planning combines aspects ofheuristic search control, learning, and optimizationover multiple-quality criteria. He previously con-ducted probabilistic risk assessment and reliabilityanalysis for energy facilities and developed softwarefor statistical analysis of experimental nuclear fuelassemblies. His e-mail address is [email protected].

Subbarao Kambhampati is a professor of computerscience and engineering at Arizona State University,where he directs the Yochan research group. His cur-rent research interests are in automated planning,scheduling, learning, and information integration.He received his formative education in Peddapuram,B.Tech from the Indian University of Technology,Madras (Chennai), and his M.S. and Ph.D. from theUniversity of Maryland at College Park. His e-mailaddress is [email protected].

Articles

96 AI MAGAZINE

A classic…still available

from AAAI(members receive a 20% discount!)

650-328-3123445 Burgess Drive

Menlo Park, CA 94025www.aaai.org/Press/