PAPER BY JONATHAN MUGAN & BENJAMIN KUIPERS PRESENTED BY DANIEL HOUGH Learning Distinctions and Rules in a Continuous World through Active Exploration

PAPER BY

JONATHAN MUGAN & BENJAMIN KUIPERSPRESENTED BY

DANIEL HOUGH

Learning Distinctions and Rules in a Continuous World through

Active Exploration

The Challenge

To build a robot which learns its environment like children do.

Piaget [1952] theorised that children constructed this knowledge in stages

Cohen [2002] proposed that children have a domain-general information processing system for bootstrapping knowledge.

Foundations

The Focus of the work: How a developing agent can learn temporal contingencies in the form of predictive rules over events.

Watson [2001] proposed a model of contingencies based on his observations of infant behaviour: Prospective temporal contingency: Event B tends to

follow Event A with a likelihood greater than chance Retrospective temporal contingency: Event A tends to

come before Event B more often than chance.Distinctions must be found to determine

when an event has occurred.

Foundations

Drescher [1991] proposed a model inspired by Piaget where contingencies (here schemas) are found using marginal attribution.

Results are found which follow actions in a method similar to Watson’s.

For each schema (in the form of an action + result), the algorithm searches for context (situation) that makes the result more likely to follow that action.

The MethodIntroduction

Here, prospective contingencies as well as contingencies in which events occur simultaneously are represented using predictive rules

These rules are learned using a method inspired by marginal attribution

The difference with Drescher is continuous variables.

This brings up the issue of determining when events occur, so distinctions must be found.


Motor babbling method from last week to learn distinctions and contingencies.

This was undirected, does not allow learning for larger problems – too much effort is wasted on uninteresting portions of state space.


In this algorithm, the agent receives as input the values of time-varying continuous variables but can only represent, reason about and construct knowledge using discrete values.

Continuous values are discretised using distinctions in the form of landmarks: A discrete value v(t) for each continuous variable v’(t); If for landmarks v1 and v2, v1 < v’(t) < v2 then v(t) has the open

interval between v1 and v2 as its value, v = (v1,v2). The association means agent can focus on changes of v =

eventsThe agent greedily learns rules that use one event to

predict another.

The MethodHow it’s evaluated

The method is evaluated using a simulated robot based on the situation of a baby sitting in a high chair.

Fig. 1: Adorable Fig. 2: Less adorable

The MethodKnowledge Representation & Learning

The goal is for the agent to learn to identify landmark values from its own experience.

The importance of a qualitative distinction is estimated from the reliability of the rules that can be learned, given that distinction.

The qualitative representation is based on QSIM [Kupiers, 1994]


A continuous variable x’(t) is represented by discrete variable x(t) for magnitude and x’’(t) for the direction of change of x’(t), and ranges over some subset of the real number line (-∞, +∞).

In QSIM, magnitude is abstracted to a discrete variable x(t) that ranges over a quantity space Q(x) of qualitative values.Q(x) = L(x) U I(x)where L(x) = {x1,...,xn}

landmark valuesI(x) = {(-∞,x1),(x1,x2),...,(xn, +∞)}

mutually disjoint open intervals


A quantity space with two landmarks might be described as (x1,x2), which implies five distinct qualtitative values,

Q(x) = {(-∞,x1),x1,(x1,x2),x2,(x2, +∞)}

A discrete variable x’’(t) for direction of change of x’(t) has a single intrinsic landmark at 0, so its initial quantity space is

Q(x’’) = {(-∞,0),(0,+∞)}

The MethodKnowledge Representation & Learning: Events

If a is the qualitative value of a discrete variable A, meaning a ∈ Q(A), then the event At → a is defined by A(t – 1) =/= a and A(t) = a

That is, an event takes place when a discrete variable A changes to value a at time t, from some other value.

The MethodKnowledge Representation & Learning: Predictive Rules

This is how temporal contingencies are described

There are two types of predictive rules: Causal: one event occurs after another later in time Functional: linked by a function so happen at the

same time

The MethodLearning a predictive rule

The agent wants to learn rule which predicts a certain event h

It will look at other events and find that if one, u, leads to h more likely than others, then it will create a rule with that event as the antecedent It does so by starting with an initial rule with no

context

The MethodLandmarks

When a new landmark is inserted into Q(x) we replace one interval with two intervals and the dividing landmark, e.g. a new landmark x* we have(xi,x*),x*,(x*,xi+1)

Whenever a new landmark is inserted, statistics about the previous state space are thrown out and new ones are built up. This means checking that the reliability of the rule must be checked.

The MethodThe Learning Process

1. Do 7 timesa) Actively explore world with set of candidate goals

coming from discrete variables in M for 1000 timesteps

b) Learn new causal and functional rulesc) Learn new landmarks by examining statistics stored

in rules and events

2. Gather 3000 more timesteps of experience to solidify the learned rules

3. Update the strata4. Goto 1

EvaluationExperimental Setup

The robot has two motor variables, one for each of its degrees of freedom

A perpetual system creates variables for each of the two tracked objects in the environment: the hand and the block.

Too many variables to reasonably explain here, each has various constraints

During learning of the block is knocked off the tray or if it is not moved for 300 timesteps, it’s put back on the tray in a random position within reach of the agent

EvaluationExperimental Results

The algorithm was evaluated using the simple task of moving the block in a specified direction.

It was ran five times using passive learning and five using active learning and each run lasted 120,000 timesteps.

Each active run of the algorithm resulted in an average of 62 predictive rules.

The agent gains proficiency as it learns until reaching threshold at approximately 70,000 timesteps for both.

EvaluationExperimental Results

Clearly, active exploration appears to do better since at 40,000 timesteps, active learning achieves the level passive has at 60,000 timesteps.

The Complexity of Space and Time

The storage required to learn new rules is O(e2), as is the number of rules – but only a small number are learned by the agent.

Using marginal attribution each rule requires storage O(e), although all pairs of events are stored for simplicity.

Conclusion

First the agent could only determine the direction of movement of an object

Active exploration of environment and using rules to learn distinctions then using distinctions to learn more rules, the agent progressed from having a very simple representation towards a representation that is aligned with the natural “joints” of its environment.

Documents

PAPER BY JONATHAN MUGAN & BENJAMIN KUIPERS PRESENTED BY DANIEL HOUGH Learning Distinctions and Rules in a Continuous World through Active Exploration