29
ISLE Team Y2 Plans Dan Shapiro, Pat Langley 2/2/07

ISLE Team Y2 Plans Dan Shapiro, Pat Langley 2/2/07

  • View
    214

  • Download
    1

Embed Size (px)

Citation preview

ISLE Team Y2 Plans

Dan Shapiro, Pat Langley

2/2/07

Dimensions of Knowledge Transfer

Dif

fere

nce

in C

onte

nt

Difference in Representation00

Memorization

Different Representations

(e.g., most cross-domain

transfer)

Similar Representations

(e.g., within-domain

transfer) Knowledge

Reuse

First-PrinciplesReasoning

Isomorphism

We have already solved these problems.We have already solved these problems.

We know the solution to a similar We know the solution to a similar problem with a different representation, problem with a different representation,

possibly from another domain.possibly from another domain.

We have not solved We have not solved this before, but we this before, but we

know other pertinent know other pertinent information about this information about this domain that uses the domain that uses the same representation.same representation.

We have not solved We have not solved similar problems, and similar problems, and are not familiar with are not familiar with

this domain and this domain and problem representation.problem representation.

Knowledge transfer complexity is determined primarily by differences in the Knowledge transfer complexity is determined primarily by differences in the knowledge content and representation between the source and target problems. knowledge content and representation between the source and target problems.

Problem Solver

Claims about Transfer Learning

Claim: Transfer that produces human rates of learning depends on reusing structures that are relational and composable

Test: Design source/target scenarios which involve shared relational structures that satisfy specified classes of transformations

Example: Draw source and target problems from branches of physics with established relations among statements and solutions

Claim: Deep transfer depends on the ability to discover mappings between superficially different representations

Test: Design source/target scenarios that use different predicates and distinct formulations of states, rules, and goals

Example: Define two games in GGP that are nearly equivalent but have no superficial relationship

Meta-Claim: These claims hold for domains that involve reactive execution, problem-solving search, and conceptual inference

Test: Demonstrate deep transfer in testbeds that need these aspects of cognitive systems

Example: Develop transfer learning agents for Urban Combat, GGP, and Physics

Predicate invention for representation mapping in Markov logic (Washington)

Goal-directed solution analysis for hierarchical skill mapping (ISLE)

Representation mapping through deep structural analogy (Northwestern)

Semantic learning augmented with procedural chunking (Michigan)

We will explore four paths to deep transfer:

ISLE Team Y2 Technology Components

The ICARUS Architecture

Markov Logic Networks The Soar Architecture

The Companions Architecture

Body

Long-Term MemoriesProcedural

Short-Term Memory

Dec

isio

n P

roce

dure

Chunking

Episodic

EpisodicLearning

SemanticLearning

Semantic

ReinforcementLearning

Perception Action

MarkovLogic

WeightedSatisfiability

Markov ChainMonte Carlo

Inductive LogicProgramming

WeightLearning

TargetDomain

SourceDomain

Long-TermLong-TermConceptualConceptual

MemoryMemory

Short-TermShort-TermConceptualConceptual

MemoryMemory

Short-TermShort-TermGoal/SkillGoal/SkillMemoryMemory

ConceptualConceptualInferenceInference

SkillSkillExecutionExecution

PerceptionPerception

EnvironmentEnvironment

PerceptualPerceptualBufferBuffer

Problem SolvingProblem SolvingSkill LearningSkill Learning

MotorMotorBufferBuffer

Skill RetrievalSkill Retrieval

Long-TermLong-TermSkill MemorySkill Memory

Replace w/Alchemyinference software

Augment with CYCknowledge base

Incorporate HTN planning methods

Cycorp

Washington

Maryland

Add methods for learning value fns

UT Austin

Facilitator

Executive

nuSketch GUI RelationalConcept

Map

SessionManager

Cluster

User’sWindows box

MAC/FACDomain Model

Tickler

SessionReasoner

InteractionManager

Visual/SpatialReasoner

SEQLDomain

Generalizer

MAC/FACSelf Model

Tickler

SEQLSelf ModelGeneralizer

MAC/FACUser Model

Tickler

SEQLUser ModelGeneralizer

InteractiveExplanationInterface

OfflineLearningOffline

LearningOfflineLearning

Scientific Claims for IcarusFLEXIBLE INFERENCE OVER RICH COGNITIVE STRUCTURES

•Improve transfer by making conceptual inference and skill retrieval more robust•Approach: Combine inference over Markov logic networks (Alchemy), which unify relational logic and probabilistic reasoning, with goal-indexed retrieval of hierachical task networks (Icarus)

ANALYTICAL DISCOVERY OF CROSS-DOMAIN MAPPINGS• Improve transfer by specifying mappings between concepts and skills in source and target domains• Approach: Analyze problem-solving traces to identify similar structures then use to generate candidate mappings

PROBABILISTIC LEARNING OF HIDDEN PREDICATES• Improve transfer by specifying mappings between concepts and skills in source and target domains• Approach: Use regularities in relational data to postulate hidden predicates that map onto concepts in each domain, then use Markov logic to make inferences for target based on sourcesource

concepts

targetconcepts

targetskills

sourceskills

Ammunition

Combustible_material

Surround_enemy

Contain_fire

ANALYTICAL INVENTION OF HIGH-UTILITY PREDICATES • Improve transfer by exploiting symbolic domain knowledge and statistics from experience to generate new concepts. Use to enhance value functions and refine hierarchical skills.Regression through operators

and clause definitions generate new useful features.

Knowledge compilation and abstraction operations can produce abstract features

Entire concept derivation tree is available for use in transfer domains.

Prof(Ray)

Student(Lily)

Prof(Pedro)

Student(Stan)

Paper(Relearning MLN)

Paper(Path-Finding in MLN)

Paper(SPI)

Paper(Structure Learning)

Project(Transfer Learning)

Project(Alchemy)

Agency(DARPA)

Agency(ONR)

Agency(NSF)

FundedBy(researcher, agency)

AdvisedBy(student, researcher)

Director(Scorsese)

Actor(Damon)

Director(Spielberg)

Actor(Barrymore)

Scene(Infiltration)

Scene(Contact)

Scene(Bicycle)

Movie(Departed)

Movie(ET)

Company(Paramount)

Company(Universal)

InvestorOf(director,company)

DirectedBy(actor, director)

Scene(Exchange)

HasSymptom (Alan,Palpitations)

HasSymptom(Alan,Angina)

HasSymptom(Alan,ShortBreath)

HasDisease(Alan,HeartDisease)

HasSymptom(Bob,Angina)

HasSymptom(Bob,ShortBreath)

HasDisease(Bob,HeartDisease)

HasSymptom(Charles, Palpitation)

HasDisease(Charles,HeartDisease)

FatherOf(Alan,Bob)

hasGeneticBasis(HeartDisease)

FatherOf(Bob, Charles)

Observed

Scientific Claims for SoarDeliberate Reflection Constructs Declarative and Procedural Generalizations and Abstractions

• Analyze episodes, detecting for commonalities across multiple situations and across multiple games

• Store abstractions with expected results in semantic memory (declarative structures) and as chunks (procedures) for future retrieval.

Does this new game have the

concept of pinned in

it?

I used pinned in multiple

games to help me win. I

should use it whenever I can.

Episodic Memory Holds Behavior History for Future Analysis• Automatically records experiences for future efficient retrieval and

playback• Supports post-hoc analysis and detection of generalizations that

occur across multiple states that transfer to new situations• Supports detection, comparison and generalization of regularities

that occur across multiple game episodes

If my enemy “pins” my piece, I

can’t move it without losing another piece

Automatic and Deliberate Retrieval of Stored Results Creates Mappings that Direct Behavior

• Automatically elaborate new situations with simple abstractions• Deliberately analyze new tasks, attempting to detect previously

learned complex abstractions (mappings). • Retrieve results tied to abstractions that are stored in semantic

memory or as chunks. • Direct behavior using retrieved results – transfer!

Use RL to Tune Abstractions and Generalizations• Use reinforcement learning to learn when mappings actually

help problem solving. • Overtime will avoid mappings that only appear to be useful and

will use mappings that lead to success in target domain.

Scientific Claims for CompanionsSelf-modeling capabilities will promote transfer•Learn more robust generalizations through focused, off-line analysis• Approach: Detailed analysis and comparison of game rules and records of played games to evaluate learned knowledge, formulate new learning goals

I keep getting

pinned! I need to work on that!

Analogical Encoding will promote transfer at levels 7-10•Automatic elaboration and reformulation of game descriptions, to achieve better gameplay with fewer instance-level transfers•Approach: Aggressive re-representation to improve productivity of the match, in terms of predictions

Originaldistantanalogy

Get traction byfiguring out which

non-identical predicates best

align

Persistent Mappings for reflection about transfer• Improve transfer at levels 7-10 by learning what does and doesn’t transfer• Approach: Reify mappings, keep track of what inferences are and are not productive

These don’t look alike yet, I need a different perspectiv

e

I can use pawn promotion,

which is like crowning!?

Metamappings will improve performance in far transfer• More robust cross-domain analogies, even in distant (level 10) transfer• Approach: Recursively find analogies between properties of non-identical predicates

Program Requirements• Go/No-go tests vs known transfer targets for five transfer

types (redefined transfer levels 6-10)• Conduct careful science, systematic exploration of transfer

• Showcase tasks in MadRTS• Demonstrate Darpa relevance, technology appeal• Same performance metric, statistical test, and score

aggregation across architectures

Regret Metric

• Shared with Berkeley; basis for both Go/No-go decisions at end of year.• Ratio of area between learning curves and the area in the bounding box.

Benefit = (Area between curves) * 100 / (y-range * x-range)• Naturally scales with problem difficulty; has an easy interpretation as a percentage improvement.

Type-1 Benefit (x = 1 through 5): 30.9218Type-2 Benefit (x = 6 through 45): 28.2873Type-3 Benefit (x = 46 through 50): 83.689 Overall Benefit (x = 1 through 50): 23.4336

The x-ranges are just the 10%-80%-10% numbers for type 1/2/3 metrics

There is some art to choosing the bounding box. For example, the ranges can be adjusted to diminish the effect of outliers, and restricted to remove data after the apparent asymptote..

Type 1 Type 2 Type 3

Ave

rage

Rew

ard

Transfer Targets

• Required Regret scores

TL    Year 2    Year 36       30          407       30          408       20          309       20          3010     20          30

• Targets concern learning curve after first element.• Aggregation method:

• Average score >= target, two architectures must each report scores at all levels

GGP Tasks

• Goals:• Foster careful, measurable science• Eliminate mechanical program risk

• Problem characteristics:• One player game with >= 1 deterministic (but unknown)

adversary• Defined within GGP (“internally simulated”)• Turn-taking format• Deterministic dynamics• Fully observable state• Reliable sensing

Showcase Tasks

• Goals• Establish Darpa relevance of TL program • Showcase novel (and riskier) technology

• Domain characteristics:• Externally simulated domain (MadRTS)• >= 1 deterministic (but unknown) adversary• Pause/go format• Primarily deterministic dynamics (except combat)• Partial GDL descriptions of state and dynamics• Aggregate actions, actions over time• Fully or partially observable state (choice)• Reliable or non-reliable sensing (choice)

• What we don’t know about domain characteristics:• What portion of the domain will GDL capture, and when?

Showcase Task Questions

1. Will showcase tasks have Go/No status?• If financially feasible to construct scenarios and interfaces then

include as Go/No-go tests

2. What are showcase tasks?• Tasks known to agents. Domain engineering allowed. • Dan O. suggests one core problem for each of three transfer

types, with syntactic variants to supply statistical relevance. ISLE suggests two core tasks, total.

• Core tasks cooperatively defined with evaluation team. Variants defined by evaluation team, within agreed parameters.

• Goal of statistical test is to verify that agents robustly solve core problems. Coverage of transfer type in MadRTS not an issue.

3. Will all architectures address showcase task?• Yes, pending solution to feasibility issue.

Evaluation Architecture

• GameMaster+ supplies access to internal & external domains

GameMaster+

ExternalSimulators(e.g., MadRTS)

TLAgent

Liet

Percepts

Actions

Experimentation Manager

Game rules,percepts

Actions

GDL Simulator

Game Database

•Experiment specification,•Commands (e.g., start,

pause, analyze)

•Status messages•Analysis results

Commands (e.g.,which scenario,what length pause.)

Statusmessages

Actions

Percepts

Actions

Percepts

GameManager

• Will external/internal distinction be transparent to agent developers?

Experimental Protocol Outline

Source Problems

Target Problems

Transfer Condition

Non-Transfer Condition

HumanHuman ISLEUT

TransferdifferenceScaled transferdifference

Ratio

ARR (narrow)

ARR (wide)

Transfer ratio

Truncatedtransfer ratio

Jump start

Asymptoticadvantage

ISLELevel 1

UM UTUMLevel 2TL

Metrics

Agent Transfer Scores

Statistical Analysis

Agents

Agent

Learning Curves

Significance wrt Null Hypothesis

Metrics (benefit)

Protocol Constraints

• The protocol requires >=2 targets per scenario • >=1 to discover mappings (a cost)• Others to show transfer benefit

• The protocol requires >=2 sources per scenario• >=1 to ground all facets of the mapping• Others to enforce the mapping is persistent

Claim: Deep transfer requires technology for detecting and exploiting persistent mappings between source and target domains.

Protocol Details

(O1, O2 ,…,Oo) (T1, T2 ,…,TT)

(O1, O2 ,…,Oo) (T1, T2 ,…,TT)…

(O1, O2 ,…,Oo) (T1, T2 ,…,TT)

S = scenariosO = sources per scenarioT = targets per scenarioM = matches per source or targetK = trials per scenario

Protocol time complexity = M(O+2T)SK

Primary difference from Y1: scenarios contain sets of problems.

• Each line is one scenario

• Three source, three target problems per scenario• Seven scenarios per transfer type

Sample Learning Curves• INCLUDE SAMPLE LEARNING CURVES HERE, or

on previous slide

Protocol Pseudocode

S = number of scenariosO = sources per scenarioT = targets per scenarioM = matches per source or targetK = trials per scenario

Protocol problem (time) complexity = M(O+2T)SK

run_protocol(S,O,T, M, K, PO, PT,) =for n = 1 to S ; For each scenario for k = 1 to K ; Number of permutations (trials) Randomize the order of source problems O and target problems T

foreach c in {no_transfer, transfer} ; For each experiment conditionKB = ; Clear the knowledge baseif (c == transfer) ; If transfer condition, train on source problems

for i = 1 to O for m = 1 to M ; Number of matches

KB = train(pi,KB)endif

for j = 1 to T ; Train on target problemsfor m = 1 to M

{KB, results[n,r,pS,c,pT]} = train_and_record(pj,KB) ; Update KB and record resultoutput(results)

For each transfer level:

Development Plan• Separate Go/No-go, showcase paths

• Incremental model for each: build initial agent technology, evaluate against initial problems, repeat

• Key Go/No-go dates:• Initial experiments March 31• Revised problems April 30• Go/No-go tests completed August 24, 2007

• Key Demonstration dates:• Initial demonstration April 30• Revised problems June 1• Final demonstrations September 15

Integration Plan (Icarus)• GRAPHIC SHOWING TECHNOLOGY from

Maryland, UW, Rutgers, Austin going into Icarus

GDL 4/30 5/31

Go/No-go Task FlowEvaluation Team

ISLE Team

GameMaster+ Development

Experiment Design

Agent DevelopmentInitial Test 3/31

Dec Jan Feb Mar Apr May Jun Jul Aug Sep

O,T,S 12/31

GDL 12/31 1/31 2/28

Scenario (&TL) Definition

GameMaster 12/31

Experimentation Manager 2/28

GameMaster+ Development

Experiment Design

Agent Development Go/No-go 8/24

O,T,S 4/30

Scenario (&TL) Definition

Problem Generator 6/30

Feedback 4/15

Who demos? 1/15

Sim & Agent/LIET Interfaces 2/15

Demonstration Task FlowEvaluation Team

ISLE Team

GameMaster+ Development

Agent Development

Dec Jan Feb Mar Apr May Jun Jul Aug Sep

Domain Choice 12/31

GameMaster+, GDL+ 2/28 3/31

Scenario Construction

Simulator 1/15

Feedback 5/15

Domain Engineering

Evaluation Plan 1/31

Scenario Design 2/15 3/15

Sim & Agent/LIET Interfaces 6/15

GameMaster+ Development

Experiment Design

Agent Development

GameMaster+, GDL+ 6/1 7/1 8/1

Scenario Construction

Domain Engineering

Evaluation Plan 5/31

Initial Demo 4/30

Experiment Design

Scenario Design 5/15 6/15 7/15

Final Demo 9/15

Summary of Work Products, Dates

ISLE Team Evaluation Team

Demonstration DatesDomain Choice 12/31Evaluation Plan 1/31Interface Engineering 2/15Scenario Design 3/15Initial Demonstration 4/30

Evaluation Plan 5/31Interface Engineering 6/15Scenario Design 5/15Scenario Design 6/15Scenario Design 7/15Final Demonstrations 9/15

Go/No-go DatesCurrent GameMaster 12/31GDL for Scenarios 12/31GDL for Scenarios 1/31GDL for Scenarios 2/28Exerimentation Manager 2/28

Feedback on first pass 4/15GDL for Scenarios 4/30GDL for Scenarios 5/31Problem Generator 6/30

Demonstration DatesSimulator 1/15Interface Engineering 2/15GameMaster+ 2/28GDL+ for Scenarios 2/28GameMaster+ 3/31GDL+ for Scenarios 3/31

Feedback on first pass 5/15GameMaster+ 6/1GDL+ for Scenarios 6/1Interface Engineering 6/15GameMaster+ 7/1GDL+ for Scenarios 7/1GameMaster+ 8/1GDL+ for Scenarios 8/1

Go/No-go DatesExperimental Design (O,T,S) 12/31Initial Tests 3/31

Feedback on first pass 4/15Experimental Design (O,T,S) 4/30Go/No-go tests 8/24

Open Questions• QUESTIONS for DARPA GO HERE

Darpa’s Proposed Schedule

Task Deadline Dec Jan Feb Mar Apr May Jun Jul Aug SepTest Harness

First cut of GGP/LIET API released 29-Dec-06 ………XFinal version of extended GameMaster w/ LIET 30-Mar-07 .…X

GDL+Roadmap and 1st version of extension 8-Jan-07 ……XFinal version of GDL 30-Mar-07 ....X

ScenariosInitial Scenarios Defined 22-Dec-06 ……….XFinal Scenario Defined 30-Mar-07 ….XScenario Generators Spec 2-Feb-07 XFirst Scenario Generator 28-Feb-07 ….XBulk of Scenario generators delivered 30-Mar-07 ………..XFinal Scenario deadline 4-May-07 .X

TestingTest Readiness Review 29-Jun-07 ……XGo/NoGo Testing begins 6-Aug-07 ..X

Go/NoGo Testing completed 24-Aug-07 ……XGo/NoGo Test results analyzed and reported 18-Sep-07 ……X

ProgrammaticYear 2 Kickoff ….XY3 Go/NoGo Briefing 28-Sep-07 ……X

Go/No-go Experiment Questions

• There are alternate experimental designs• Learning curves track performance across scenarios, each score

averages across target problems (O=3,T=3,S=4, K=7)• Learning curves track performance across iterations of single match

• Expect transfer on Tth target problem (O,T=3, S=7, K=1, M>>1) • Seek transfer on any target problem (O,T=3, T*S>=7, K=1, M>>1)

• Scenario requirements may be problematic• Stanford Logic Group may be able to construct a scenario generator

for Go/No-go tests, else hand coded

• We’d prefer that evaluation methodology not drive technology • CPU time for evaluation problematic if S, K, or M >>1, time/cycle slow

• Current Icarus cycle times: UCT: 0.2 - 2s, GGP: 3 - 90s• All three tech teams using the same pool of resources (Gamemaster

and opponents) of the Stanford Logic Group.

Demonstration Experiment Questions

• Issues:• GDL descriptions have a different status and may be

less complete than for Go/No-go tasks• Domain scenarios may arrive later• Their numbers will be constrained (no generator)• Current statistical evaluation methodology may

interfere with the demonstration goal (this might be an opportunity to introduce qualitative metrics in Y3).

• Evaluation options:• Measure transfer per the same protocol and metric• and/or use entirely more qualitative method