Pat Langley School of Computing and Informatics Arizona State University Tempe, Arizona USA Institute for the Study of Learning and Expertise Palo Alto,

Pat Langley

School of Computing and InformaticsArizona State University

Tempe, Arizona USA

Institute for the Study of Learning and ExpertisePalo Alto, California USA

Challenges in Learning Plan Knowledge

Thanks to D. Choi, T. Konik, U. Kutur, N. Li, D. Nau, N. Nejati, and D. Shapiro for their many contributions. This talk reports research funded by grants from DARPA IPTO, which is not responsible for its contents.

Outline of the Talk

1. Brief review of learning plan knowledge

2. Learning from different sources

3. Learning for new performance tasks

4. Learning in different scenarios

5. Learning with novel representations

6. Some responses to these challenges

7. Concluding remarks

The Problem: Learning Plan Knowledge

Given: Basic knowledge about some action-oriented domain. (e.g., state/goal representation, operators)

Given: A set of training problems (e.g., initial states, goals, and possibly more)

Given: Some performance task that the system must carry out.

Given: A performance mechanism that can use knowledge to carry out that task.

Learn: Knowledge that will let the system improve its ability to perform new tasks from the same or similar

domain.

Topics Not Covered

This talk will range widely, but I will not cover issues related to:

Learning with impoverished representations

Interested in human-like, intelligent behavior

Most work on reinforcement learning is irrelevant

Acquiring basic knowledge about domain

Interested in building on such knowledge

Most work on learning action models is too basic

Nonincremental learning from large data sets

Interested in human-like incremental learning

This rules out most data-mining approaches

Historical Topics

There has been a long history of work on learning plan knowledge: Forming macro-operators

Fikes et al. (1972), Iba (1988), Mooney (1989), Botea et al. (2005)

Inducing forward-chaining control rules Anzai & Simon (1978) Mitchell et al. (1981), Langley (1982)

Learning control rules analytically Laird et al. (1986), Mitchell et al. (1986), Minton (1988)

Problem solving by analogy Veloso (1994), Jones & Langley (1995), VanLehn & Jones (1994)

Inducing control rules for partial-order plans Kautukam & Kambhampati (1994), Estlin & Mooney (1997)

Historical Trends

Work on learning plan knowledge has seen many shifts in fashion:

Early hope for improving problem solvers/planners (19781985)

Excitement/confusion introduced by EBL movement (19861992)

Some doubts raised by the “utility problem” (19881993)

Mass migration to reinforcement learning paradigm (19932003)

Resurgence of interest in learning plan knowledge (2004present)

Throughout these changes, the problems and potential of learning plan knowledge have remained.

Traditional Sources of Information

Most research on learning for planning has assumed the system uses search to generate:

Successful paths that achieve the goals (positive instances)

Failed paths that do not achieve the goals (negative instances)

Alternative paths of different desirability (preferred instances)

But humans learn from other sources of information and our AI systems should as well.

Challenge: Learn from Many Sources

There has been relatively little research on plan learning from:

Demonstrations of solved problems (Nejati et al., 2006)

Explicit instruction from teacher (Blythe et al., 2007)

Advice or hints from teacher (Mostow, 1983)

Mental simulations or daydreaming (Mueller, 1985)

Undesirable side effects during execution

Humans learn from all of these sources, and our learning systems should support the same capabilities.

Moreover, we should develop single systems that integrate plan knowledge learned from all of them (Oblinger, 2006).

Traditional Performance Tasks

Most research on learning for planning has assumed the system aims to improve:

The efficiency of plan generation (nodes expanded, time)

The quality of generated plans (path length, utility)

The coverage of plan knowledge (problems solved)

But humans learn and use plan knowledge for other purposes that are just as valid.

Challenge: Learn for Plan Execution

Many important domains require executing plan knowledge in some environment that includes:

operators with likely but nonguaranteed effects

external events not directly under the agents control

other agents that are pursuing their own goals

Urban driving is one setting that raises all three of these issues.

Complex board games like chess, although deterministic, still require interleaving of planning and execution.

We need more research on plan learning in contexts of this sort (e.g., Benson, 1995; Fern et al., 2004).

Challenge: Learn for Plan Understanding

Another understudied problem is learning for plan understanding.

Given: A partially observed sequence of states influenced by another agent’s actions.

Given: Learned knowledge about how to achieve goals.

Find: The other agent’s goals and the plans it is pursuing to achieve them.

Plan understanding is important not only in complex games, but in military planning, politics, and other settings.

This performance task suggests new learning problems, methods, and evaluation criteria.

Traditional Learning Scenarios

Most research on learning for planning has assumed the system:

Trains on problems from a given distribution / domain

Tests on problems from the same distribution / domain

Success depends on the extent to which the learner generalizes well to new problems from the same domain.

But humans also use their learned plan knowledge in other, more flexible ways to improve performance.

Challenge: Cumulative Learning

In complex domains, humans learn plan knowledge gradually:

Starting with small, relatively easy problems

Moving to complex problems after mastering simpler ones

Later acquisitions build naturally on earlier experience, learning to cumulative learning.

Our education system depends heavily on such “vertical transfer” of learned knowledge.

We need more learning systems that demonstrate this form of cumulative improvement (e.g., Reddy & Tadepalli, 1997).

Challenge: Cross-Domain Transfer

In other cases, humans exhibit a form of transfer that involves:

Learning to solve problems in one domain

Reusing this knowledge to solve problems in another domain that is superficially quite different

Such cross-domain transfer is related to within-domain analogical reasoning, but it is far more challenging.

In its extreme form, the two domains support similar solutions but have no shared symbols or predicates.

We need more learning systems that demonstrate this radical form of knowledge reuse.

Traditional Learned Representations

Most research on learning for planning has focused on learning:

Control rules that reduce effective branching factor

Macro-operators that reduce effective solution depth

These grew naturally from representations used to create hand-crafted expert problem solvers.

But now we have other representations of plan knowledge that suggest new learning tasks and methods.

Nor does this refer to POMDPs, workflows, or other highly constrained formalisms.

Challenge: Learn HTNs

the modularity and flexibility of search-control rules

the large-scale structure of macro-operators

Hierarchical task networks (HTNs) offer the most effective planning available, but they are expensive to build manually.

HTNs provide an ideal target for learning because they have:

Machine learning has automated the creation of expert classifiers.

We should do the same for HTNs, which are effectively expert planning systems.

Challenge: Learn HTNs

Given: Basic knowledge about some action-oriented domain

Given: A set of training problems (initial states and goals)

Given: Some performance task the system must carry out.

Given: Some module that uses HTNs to perform this task

Learn: An HTN that lets the system improve its performance on new tasks from the same or similar domain.

We can define the task of learning hierarchical task networks as:

We need more research on this important topic (e.g., Reddy & Tadepalli, 1997; Ilghami et al., 2005).

Some Responses

acquire a constrained but important class of HTNs

that one can use for both planning and reactive control

from both successful problem solving and expert traces

that extends naturally to support cross-domain transfer

Our recent research attempts to respond to these challenges by developing methods that:

Moreover, these ideas are embedded in an integrated architecture that supports many capabilities ICARUS (Langley, 2006).

Primitive Concept

(assigned-mission ?patient ?mission)

Nonprimitive Concept

(patient-form-filled ?patient)

Conceptual Knowledge in ICARUS

Conceptual knowledge is cast as Horn clauses that specify relevant relations in the environment Memory is organized hierarchically Divided into primitive and non-primitive predicates

HTN Methods in ICARUS

Similar to SHOP2 but methods indexed by goals they achieve Each method decomposes a goal into subgoals

If a method’s goal is active and its precondition is satisfied, then try to achieve its subgoals or apply its operators

precondition concept

HTN method

HTN goal concept

HTN methodsubgoal

operator

Operators in ICARUS

Effects Concept(arrival-time ?patient)

Precondition Concept(patient ?p) and

(travel-from ?p ?from) and(travel-to ?p ?to)

Action(get-arrival-time ?patient ?from ?to)

Operators describe low-level actions that agents can execute directly in the environment

Preconditions: legal conditions for action execution Effects: expected changes when action is executed

Training Input: Expert Traces and Goals

Expert demonstration traces Operators the expert uses and the resulting belief state

State: Set of concept instances Goal is a concept instance in the final state

ICARUS learns generalized skills that achieves similar goals

Operator instance(get-arrival-time P2)

Concept instance(assigned-flight P1 M1)

State

Goal concept(all-patients-arranged)

Learning Plan Knowledge from Demonstration

Plan Knowledge

If Impasse

Problem

?InitialState goal

LIGHT

Demonstration Traces

Background knowledge

Reactive Executor

Learnedplan

knowledge

Concept definitions

OperatorsStates and actions

HTNs

Expert

Learning HTNs by Trace Analysis

concepts

actions

Operator Chaining


Concept Chaining

concepts

actions


Explanation Structure for

Trace

(dest-airportpatient1 SFO)

(arrival-time NW32 1pm)

(query-arrival-time)

(scheduledNW32)

(location patient1 SFO 1pm)

(assigned patient1 NW32)

(flight-available)

(assign patient1 NW32)

(transfer-hospitalpatient1 hospital2)

(arrange-ground-transportationSFO hospital2 1pm)

(close-airport hospital2 SFO)

Time:1 Time:2

Time:3

Hierarchical Task Network Structure

(dest-airport?patient ?loc)

(arrival-time ?flight ?time)

(query-arrival-time)

(scheduled?flight)

(location ?patient ?loc ?time)

(assigned ?patient ?flight)

(flight-available)

(assign ?patient ?flight)

(transfer-hospital?patient ?hospital)

(arrange-ground-transportation?loc ?hospital ?time)

(close-airport ?hospital ?loc)

concepts

actions

Transfer by Representation Mapping

Predicate mappings

Source domain

Target domain

Challenge: Learn with Richer Goals

HTNs are more expressive than classical plans (Errol et al., 1994).

Our approach loses this advantage because it assumes the head of each method is a goal it achieves, but we can:

This scheme should acquire the full class of HTNs while still retaining the tractability of goal-directed learning.

Extend goal concepts to describe temporal behavior

Revise the execution module to handle these structures

Augment trace analysis to reason about temporal goals

Learn new methods with temporal goals in their heads

Challenge: Extend Conceptual Vocabularies

Given: A set of concepts used in goals, states, and methods

Given: New methods acquired from sample solution traces

Find: New concepts that produce improved performance as the result of future method learning.

Our approach to learning HTNs relies on the concept hierarchy used to explain solution traces.

The method would be less dependent if it extended this hierarchy:

This would support a bootstrapped learner that invents predicates to describe states, goals, and methods.

Challenge: Extend Conceptual Vocabularies

Define a new concept for the precondition of each method learned by chaining off a concept definition.

Check traces for states in which this concept becomes true and learn methods to achieve it.

During performance, treat each method’s precondition as its first subgoal, which it can achieve if submethods are known.

Our approach to utilizing predicate invention has three steps:

This technique would make an HTN more complete by growing it downward, introducing nonterminal symbols as necessary.

We have partially implemented this scheme and hope to report results at the next meeting.

Concluding Remarks: Research Style

Clearly, there remain many open problems to address in learning plan knowledge.

These involve new abilities, not improvements on existing ones, which suggests that we:

These strategies will help us extend the reach of our learning systems, not just strengthen their grasp.

Look at human behavior for ideas on how to proceed

Develop integrated systems rather than component algorithms

Demonstrate their behavior on challenging domains

Concluding Remarks: Evaluation

We must evaluate our new plan learners, but this does not mean:

More appropriate experiments would revolve around:

Measuring their speed in generating plans Showing they run faster than existing systems Entering them in planning competitions

Demonstrating entirely new functionalities Running lesion studies to show new features are required Using performance measures appropriate to the task

These steps will produce conceptual advances and scientific understanding far more than will mindless bake-offs.

Concluding Remarks: Summary

Learning plan knowledge is a key area with many open problems:

These challenges will benefit from earlier work on plan learning, but they also require new ideas.

Together, they should lead us toward learning systems that rival humans in their flexibility and power.

Learning from traces, advice, and other sources

Transferring knowledge within and across domains

Learning and extending rich structures like HTNs

End of Presentation

ICARUS Concepts for In-City Driving

((in-rightmost-lane ?self ?clane) :percepts ((self ?self) (segment ?seg)

(line ?clane segment ?seg)) :relations ((driving-well-in-segment ?self ?seg ?clane)

(last-lane ?clane) (not (lane-to-right ?clane ?anylane))))

((driving-well-in-segment ?self ?seg ?lane) :percepts ((self ?self) (segment ?seg) (line ?lane segment ?seg)) :relations ((in-segment ?self ?seg) (in-lane ?self ?lane)

(aligned-with-lane-in-segment ?self ?seg ?lane)(centered-in-lane ?self ?seg ?lane)(steering-wheel-straight ?self)))

((in-lane ?self ?lane) :percepts ((self ?self segment ?seg) (line ?lane segment ?seg dist ?dist)) :tests ( (> ?dist -10) (<= ?dist 0)))

Representing Short-Term Beliefs/Goals

(current-street me A) (current-segment me g550)(lane-to-right g599 g601) (first-lane g599)(last-lane g599) (last-lane g601)(at-speed-for-u-turn me) (slow-for-right-turn me)(steering-wheel-not-straight me) (centered-in-lane me g550 g599)(in-lane me g599) (in-segment me g550)(on-right-side-in-segment me) (intersection-behind g550 g522)(building-on-left g288) (building-on-left g425)(building-on-left g427) (building-on-left g429)(building-on-left g431) (building-on-left g433)(building-on-right g287) (building-on-right g279)(increasing-direction me) (buildings-on-right g287 g279)

((in-rightmost-lane ?self ?line) :percepts ((self ?self) (line ?line)) :start ((last-lane ?line)) :subgoals ((driving-well-in-segment ?self ?seg ?line)))

((driving-well-in-segment ?self ?seg ?line) :percepts ((segment ?seg) (line ?line) (self ?self)) :start ((steering-wheel-straight ?self)) :subgoals ((in-segment ?self ?seg)

(centered-in-lane ?self ?seg ?line)(aligned-with-lane-in-segment ?self ?seg ?line)(steering-wheel-straight ?self)))

((in-segment ?self ?endsg) :percepts ((self ?self speed ?speed) (intersection ?int cross ?cross)

(segment ?endsg street ?cross angle ?angle)) :start ((in-intersection-for-right-turn ?self ?int)) :actions ((steer 1)))

ICARUS Skills for In-City Driving

IICARUSCARUS Interleaves Execution and Problem Solving Interleaves Execution and Problem Solving

Executed plan

Problem

??

Skill Hierarchy

Primitive Skills

ReactiveExecution

impasse?

ProblemSolving

yesyes

nono

This organization reflects the psychological distinction between automatized This organization reflects the psychological distinction between automatized and controlled behavior. and controlled behavior.

Documents

Pat Langley School of Computing and Informatics Arizona State University Tempe, Arizona USA Institute for the Study of Learning and Expertise Palo Alto,