Joseph Xu Soar Workshop 2012 1 Learning Modal Continuous Models

Preview:

Citation preview

1

Joseph XuSoar Workshop 2012

Learning Modal Continuous Models

2

Setting: Continuous Environment

• Input to the agent is a set of objects with continuous properties– Position, rotation, scaling, ...

• Output is fixed-length vector of continuous numbers

• Agent runs in lock-step with environment

• Fully observable

Output

-9.0

5.8

Input

EnvironmentAgent

0.2 1.2 0.0 0.0

px py rx ry

A

B

0.0 0.2

pz rz

3.4 3.9 0.0

px py pz

0.0 0.0 0.0

rx ry rz

A B

3

Levels of Problem Solving

Motor Babbling

Continuous Sampling Methods (RRT)

Symbolic Model Free Methods (RL)

Continuous Model

Symbolic Abstraction

Symbolic Planning Symbolic Model

Slower Task CompletionSpecific Solutions

Faster Task CompletionGeneral Solutions

Problem Solving Method

Knowledge RequiredCharacteristics

NoneGoal Recognition

4

Continuous Model Learning

• Learn a function

• x: current continuous state vector

• u: current output vector• y: state vector in next

time stepx u y

ContinuousOutput

X U Y

5

Locally Weighted RegressionMotor

Command

left voltage: -0.6right voltage: 1.2 ?

x u

k nearest neighborsWeightedLinearRegression

j

jji

ii uwxwuxf ),(

Problems with LWR

• Euclidean distance doesn’t capture relational similarity

• Averages over neighbors exhibiting different types of interactions

6

Query

Neighbor

Neighbor Neighbor

Neighbor

Problems with LWR

7

Query

Neighbor Neighbor

Prediction

• Euclidean distance doesn’t capture relational similarity

• Averages over neighbors exhibiting different types of interactions

Modal Models• Object behavior can be categorized into different Modes

– Behavior within a single mode is usually simple and smooth (inertia, gravity, etc...)– Behaviors across modes can be discontinuous and complex (collisions, drops)– Modes can often be distinguished by discrete spatial relationships between objects

• Learn two-level models composed of:– A classifier that determines the active mode using spatial relationships– A set of linear functions (initial hypothesis), one for each model

8

Mod

e Cl

assi

fier Mode 1 model

Mode 2 model

Mode 3 model

Scene Prediction

Unsupervised Learning of Modes From Data

9

Environment

Mode 2

time

Mode 1

Expectation Maximization

Learned Mode 1

Learned Mode 2

𝒚

Continuous FeaturesTraining Data

0.5, 1.1, -0.2, 4, 17 21.9

10

Expectation Maximization

• ExpectationAssuming your current model parameters are correct, what is the likelihood that the model m generated data point i?

• MaximizationAssuming each data point was generated by the most probable model, modify each model’s parameters to maximize likelihood of generating data

• Iterate until convergence to local maximum

Learning Classifier

11

Spatial RelationsTraining Data

0.5, 1.1, -0.2, 4, 17 21.9

time

Scene

left-of(A,B) = 1right-of(A,B) = 0on-top(A,B) = 0touch(A,B) = 0

A B 10001010110110101011010100110010110000010101110101000010100010101111010001010000010101001111111010101010101010000100110101010100110100110010101

1

class

1111222211

attributes

1000101011011

Expectation Maximization

Learned Mode 1

Learned Mode 2

Learning Classifier

12

0101011010100110010110000010101110101000010100010101111010001010000010101001111111010101010101010000100110101010100110100110010101

1000101011011 1

Classifier Training Dataattributes class

1111222211

touch(A, B)

left-of(A, B)

mode 1 mode 2

mode 2

1 0

1 0

Use linear model for items in same model

13

Prediction Accuracy Experiment

• 2 Block Environment– Agent has two outputs (dx, dy) which control the x and y offsets of

the controlled block at every times tep– The pushed block can’t be moved except by pushing it with the

controlled block– Blocks are always axis-aligned, there’s no momentum

• Training– Instantiate Soar agent in a variety of spatial configurations– Run 10 time steps, each step is a training example

• Testing– Instantiate Soar agent in some configuration– Check accuracy of prediction for next time step

14

Prediction Accuracy – Pushed Block

10 20 30 40 50 60 70 801E-8

1E-6

1E-4

1E-2

1E+0

1E+2

1E+4

MM xMM ySM xSM y

Training Scenarios

Aver

age

Erro

r

15

Classification Performance

0 10 20 30 40 50 60 70 80 900

3

6

9

X errors Y errors

Training Scenarios

Erro

rs

16

Prediction Performance Without Classification Errors

0 10 20 30 40 50 60 70 80 901E-08

1E-05

1E-02

1E+01

1E+04

Best XBest YReal XReal Y

Training Scenarios

Aver

age

Erro

r

17

Levels of Problem Solving

Motor Babbling

Continuous Sampling Methods (RRT)

Symbolic Model Free Methods (RL)

Continuous Model

Symbolic Abstraction

Symbolic Planning Symbolic Model

Slower Task CompletionSpecific Solutions

Faster Task CompletionGeneral Solutions

Problem Solving Method

Knowledge RequiredCharacteristics

NoneGoal Recognition

18

Symbolic Abstraction• Lump continuous states sharing symbolic properties into a single

symbolic state• Should be Predictable

– Planning requires accurate model (ex. STRIPS operators)– Tends to require more states, more symbolic properties

• Should be General– Fast planning and transferrable solutions– Tends to require fewer states, fewer symbolic properties

C2C1

S1

S2C1 C1

C1

C1C1C1

C1

C1

C1

C1

S1: intersect(C1, C2)S2: ~intersect(C1, C2)

19

Symbolic Abstraction

• Hypothesis: contiguous regions of continuous space that share a single behavioral mode is a good abstract state– Planning within modes is simple because of linear

behavior– Combinatorial search occurs at symbolic level

• Spatial predicates used in continuous model decision tree are a reasonable approximation

20

Abstraction Experiment

• 3 blocks, goal is to push c2 to t• Demonstrate a solution trace to agent• Agent stores sequence of abstract states in solution in epmem• Agent tries to follow plan in analogous task• Abstraction should include predicates about c1, c2, t, avoid

predicates about d1, d2, d3

C2

C1

td1

d2

d3C2C1

C1

C2

C1

t

d1

d2

d3

C2C1

C1

21

Generalization Performance

Learned 10 Rnd 40 Rnd 80 Rnd All0

5

10

15

20

25

30 28.1

1.7

7

10.1 10.3

Abstraction Type

Num

ber o

f Tas

ks S

olve

d

80 Tasks Total

(16 average)

22

Conclusions

• For continuous environments with interacting objects, modal models are more general and accurate than uniform model

• The relationships that distinguish between modes serve as useful symbolic abstraction over continuous state

• All this work takes Soar toward being able to autonomously learn and improve behavior in continuous environments

23

Evaluation

Coal• Scaling issues: linear

regression is exponential in number of objects

• Linear modes is insufficient for more complex physics such as bouncing -> catastrophic failure

Nuggets• Modal model learning is

more accurate and general than uniform models

• Abstraction learning results are promising, but preliminary