1 Autonomous Developmental Learning Gerhard Neumann SS 2005

1

Autonomous Developmental Learning

Gerhard Neumann

SS 2005

2

Outline of the talk

General Theory and Philosophy Architecture

Sensory Mapping Cognitive Mapping Sensorimotor System

Experiments and Results Reinforcement Learning Action Chaining

3

Developmental Robotics

Investigates Models coming from developmental psychology or developmental neuroscience

Applying insights from studies on ontogenic neuroscience

Many different studies about Social interaction [Fong 2003] Sensorimotor control [Weng 2004, Metta 2003] Categorization [Pfeifer 1999] Value Systems [Pfeifer 1999, Sporns 2002] Morphological changes and motor skill acquisition

[Lungrella 2002] Discussed in this talk: Autonomous Mental

Development (AMD) [Weng 2001] as one approach for Developmental Robotics

4

Machine Development Paradigms

Manual development Given: Task T and ecological conditions Ec

Human developer H understands Task T and programs the agent: A = H(T, Ec)

Task specific architecture, representation and skills are developed by human hands

Autonomous development Given: ecological conditions Ec, the task is unkown

Internal representation can not be predefined Human developer H writes a task-non specific developmental

program for the newborn agent: A(0) = H(Ec) The task has to be understood by the agent itself

After Birth Human teachers can affect the behavior of the robot by: Supervised learning Reinforcement learning Communicative learning

5

SASE Agents

Self-Aware and Self-Effecting Agent: Has additionally an internal

environment (the brain) Internal sensors and internal

effectors in addition to external sensors and effectors E.g. attention control and action

realease are internal actions All conscious internal actions have

coresponding internal sensors Internal and external environments

are used for perception and cognition

6

Internal Representation

Symbolic Representation (Traditional AI): world-centered : Describes an object in the external world

with an unique predefined set of attributes. Each component in the representation has a predefined

meaning. Distributed Representation (AMD):

Body centered: Grown from the body‘s sensors and effectors

Vector form: A = (v1, v2, …, vn) consisting of sensory input and motor control output (or a function of both).

Representation of an object is distributed over many cortical areas.

Used by developmental programs because the task is unknown at programming time.

7

Architecture: past and future contexts

System receives last context as the input vector: l(t) = <xl(t), al(t)>

xl(t), al(t) : last sensation and last action Include internal sensors and actions.

8

Architecture: Primed Contexts

System predicts the primed (future) context: p(t) = <xp(t), ap(t), Q(xp(t), ap(t))> Not sufficient to predict one primed context

There might be multiple future possibilities Reality mapping R : {p1(t), p2(t), …, pk(t)} = R(l(t)) Value system V : selects a desirable context

pi(t), based on the Q-Value Second Mapping F for the far future. R und F are developed incrementally

through experience

9

Sensory Mapping: Staggered Hierarchical Mapping

High dimensional sensory input Visual: Images 100 x 100 : 10000 Auditory: 300 – 1000

Appropriate (autonomous) feature extraction is needed

Inspired by human early visual pathways Uses incremential PCA in receptive fields Apply filters to the receptive fields

Each Filter is given by the eigenvectors of the PCA calculation

10

Sensory Mapping: Staggered Receptive Fields

Non Overlapping : Many Filters for one

receptive field (RF) All eigen-vectors are used

for one RF Low resolution

Overlapping (Staggered) One filter (a specified

eigenvector) per RF Tradeoff between resolution

and the dimension of the feature space

11

Sensory Mapping: Layered Structure

Neurons of each layer are organized in a 2-D array (resembles the structure of images)

For each neuron of layer k localized connections are applied to the neurons of layer k-1

At any position with any scale, a neuron can be found whose receptive field approximately covers the region.

12

Sensory Mapping: CCIPCA

Candid Covariance-Free Incremental Principal Component Analysis Standard PCA is a batch method, not applicable for

developmental learning Usually very high dimensional input, can not compute

covariance matrix in real time Standard PCA:

Computes eigen-directions (direction of maximal variance) of the sample data

Eigen-directions are the eigenvectors of the covariance matrix A = E[u(t) uT(t)] (u(t) … zero mean sample distribution)

13

Sensory Mapping: Incremental PCA

Calculate 1st eigenvector

Converges to the eigenvector with the highest eigenvalue

No convergence if we have equal eigenvectors High Order Eigenvectors

Substract the projection of the data sample with the lower order eigenvectors

Apply the same algorithm as for the first eigenvector

n

i

T

iv

iviuiu

niv

Axxv

1 )1(

)1()()(

1)(

)(

)(

)(

)()()()(

1

1

1

1112 iv

iv

iv

iviuiuiu

14

Sensory Mapping: Eigengroups

Eigengroup of layer k is defined as n x n n … maximum distance between two neurons to have

their input region overlap Define the number of the eigenvector for each neuron

(filter) in a eigengroup Usually this ordering is the same for all eigengroups in a

layer Calculate the eigenvectors incrementially with CCIPCA in

the predefined order and substract the projection of the data with the coresponding eigenvector

Inhibition of nearby neurons: detect different statistically uncorrelated features

Output: Product of the input vector with the eigenvector Apply a sigmoidal function

15

Sensory Mapping: Eigengroups

Eigengroups: Sharing Method: a single set of filters for all eigengroups Applied to 5000 natural images First several filters are similar to biological receptive field patterns

16

Sensory Mapping: Selective Attention

Each sensory mapping unit has internal attention effectors

Layered structure: Can not define clear-cut attended region in the input

space Attented Region: 3-D ellipsoid centered at (x,y,l), l is

the layer. Experiments with Occlusion

Occlude either the upper or lower half of face images

Own SHMs were used for the different occlusions

Outperforms the approach without attention control significantly

17

Cognitive Mapping: Incremental Hierarchical Discriminant Regression (IHDR)

Cognitive Mapping:

X : space of last contexts Y : space of primed contexts

Find discriminant features in input space High dimensional input space

Classical decision trees are not applicable

Modelled by an Hierarchical Discriminant Regression Tree

YXf :

18

Cognitive Mapping: IHDR Tree

Each node contains: x-clusters and coresponding y-cluster

y-clusters determine the virtual class labels Defines to which cluster pair the example (x,y) belongs

x-clusters approximates the sample population in the X-space Maximal q clusters of each type per node

Spawn a child node from the current node if a finer approximation is required

None of the clusters keep actual input samples, only first order statistics are used

19

Cognitive Mapping: HDR Tree

Build tree for a set of samples S: Cluster the y-vectors into p clusters Assign each example to the nearest y-cluster Calculate mean and covariance matrix of each x-cluster Reassign each example to the nearest x-cluster If the y-labels of the examples one cluster (S‘) differ

significantly, create a new node and recursively build the tree (with subset S‘ as input).

Retrieval: Calculate probabilistic-based distances to each cluster of

a node Always continue the search at the k nearest clusters

20

Cognitive Mapping

The deeper a node is in the tree, the smaller is the variance Gaussian distributions: hierarchical version of mixture of gaussian

distribution models Calculate the distance to the clusters:

Euclidian Distance Mahalanobis Distance (single covariance matrix) Gaussian Distance (individual covariance matrices) The choice of the distance measure is based on the number of samples

provided for the corresponding cluster

21

Cognitive Mapping: Distance Measure

Mahalanobis Distance, Gaussian Distance: Estimate of the cov. matrices is needed Impossible for high-dimensional input spaces Computations are done in the discriminant

space D, not in X D calculated by Fisher‘s linear discriminant analysis

(LDA) LDA calculates the best discriminating space for a K-

label classification problem For q clusters, we get a q-1 dimensional discriminant

space D Distance calculations and hence the calculations of the

covariance matrices are done in the space D

22

Sensorimotor System: Level Building Element (LBE)

S: Spatial Sensory Mapping T: Spatiotemporal Sensory Mapping Each internal and external action output feeds back into the sensory

input M: Motor mapping generates concise representations for

stereotyped actions, selects primed context with the highest confident index

23

Sensorimotor System: Level Building Element (LBE)

Priority Updating Queue (PUQ) used for the far future predictions F At every time instant, put the selected primed context p(t) in

the PUQ, remove the oldest entry. Update each entry in the Queue (beginning with the newest

entry)

Inspired by the Q-Learning Algorithm

Information embedded in the future primed context p(t+1)is back-propageted into earlier primed contexts. F can be seen as average future context

)()1(1

)()( )1()1()1()( itpitpn

litpitp nnnn

)()1()(1

)()( )1()1()1()( itQitQtrn

litQitQ nnnn

24

Sensorimotor System: Multilevel Architecture

Low-level architecture uses fine time steps Higher level can become

more abstract Use low-level primed

context as input for the high-level LBE

Same architecture is used for the different levels

Use the levels for different sensory integration (vision, audio…)

25

AMD: Teaching the Robot

Internal Representation of the Robot can not be accessed at after creation

Supervised Learning: Human imposes action by buttons or directly manipulating the

robot Set the value of an imposed action to a high value

Reinforcement Learning: Human gives rewards for the action (good = 1, bad = -1) by two

buttons Cummunicative Learning:

Desired Action Wether the current action is good Rules to follow Criterea to judge right and wrong

26

Experiments: Used Robots

SAIL: Single robot arm,

wheele driven 13 DOF

DAV: Wheele driven base,

humanoid torso 43 DOF

Sensors: Stereo-cameras, microphones, laser range scanner, touch sensors

27

Experiment: Vision-guided navigation

Indoor navigation task using SAIL Human teacher navigated robot through

corridors: supervised learning After 4 trips robot navigated autonomously,

teacher had to hand push in certain situations

After 10 trips the robot managed to navigate without help

Experiment was repeated outdoors with limited success

28

Experiment: Learning from Novelty and Rewards

Define Novelty Measure Difference between the primed sensation xp(t) and the

actual sensation xl(t)

R(t) : reward given from a human Actual reward : Q-Values learnt by Prototype Update Queue

Single Level System is used

m

j j

jjp txtx

mtn

1

))1()((1)(

)()1()()( tntRtr

29

Experiment: Learning from Novelty and Rewards

3 actions: Stay at current view Look right (30 °) Look left (30 °) 7 absolute viewing positions

Sensory Input: Simulation: 100 x 100 image SAIL: 40 x 30 x 3 x 2 images

Experiments: Habituation Effect: Startet with initial positive Q-Value for stay at

current scene, after a while roboter becomes bored Integration of novelty and immediate reward:

Positive reward for turning left, otherwise negative => always turns left

Moving Toy added to the environment in viewing angle 0 => Stay in this position

30

Experiments: Speech learning

Uses the same developmental architecture Auditory streams have not been segmented or labeled During learning, the entire system must listen to everything No gramatical syntax is envolved

Word Recoginition Numbers from 1 to 10

63 persons, 5 utterances per digit (3150 examples) 4 layers in the sensory Mapping Module Input: 13th order Mel-frequency Cepstral Coefficients (MFCCs) Supervised learning

31

Experiments: Speech learning

Selective Attention: Two layers in the sensory mapping module

Different temporal integration Attention Control: Choose from one of these layers Learned through Reinforcement Learning

Attention is learned according to each word, speaker and utterance

Results after 10 Epochs: Tail of one and seven are quite similiar

=> Take 2nd layer

32

Experiment: Action Chaining

Action Chaining:

CC, CS1, CS2 : Voice commands AS1, AS2 : Actions Conditioning Problem

Multi Level LBE are used Pure reinforcment learning does not work well due to the lack of generalization 2nd level gets the averaged version of the future context as input (the F context)

=> generalization over the current context

212211 ssCssssC AACACACC

33

Experiment: Action Chaining

Reinforcement Learning in the two levels

Lower Level proposes action, action is only executed if Q2 > 0, otherwise no action is executed

Experiment: 4 primitive actions:

Behavior establishment: Supervised Learning Action Chaining: „Start“, „one“, „two“, „three“, „four“

Success: „Start“ -> execute actions Experiment was repeated 20 times

)()1()(1

)()( )1(2

)1(2

)1(2

)(2 itQitQtr

n

litQitQ nnnn

)()1()()(1

)()( )1(1

)1(1

)1(2

)1(1

)(1 itQitQtQtr

n

litQitQ nnnnn

34

Experiment: Range-Based Navigation

Input: Laser Scanner 360 laser rays (0.5° resolution) Programmed Attention control:

If all readings are larger than treshold T, no special attention is needed because all objects are far away

If some readings are lower than T, only pass this readings, replace the other values through the average value.

Simulation experiments: Supervised learning 16 scenarios, 1157 examples With attention control: performed successfully a 5 minutes run

Result was also tested on the Dav robot 15 minutes run in a crowded corridor without collosion

35

Summary/Conclusion

New area in robotic where the task is not given to the developer No human bias Can deal with uncontrolled environments, automatically

extendable Human can only adjust the behavior by teaching

Only way to control robots in unknown domains Good methods for dealing with high dimensional

input Problematic to apply to highly accurate control tasks

(humanoid robots)

36

Literature

General Theory [Lungrella 2004] „Beyond Gazing, Pointint and Reachng, A Survey of Developmental Robotics“ [Fong 2003] „A Survey of socially interactive robots“, Robotics and Autonomous systems [Metta 2000] „Babybot: A study Into sensorimotor development“, PhD Thesis [Pfeifer 1999] „Understanding Intelligence“, MIT Press [Sporns 2002] „Embodied cognition“, MIT Handbook of brain Theory and Neual Networks [Lungrella 2003] „Learning to bounce: first lesson from a bouncing robot“, In Proc. Of the 4th Int.

Conference on Simulation of Adaptive Motion in Animals and Machines Autonomous Mental Development (Papers found on http://www.cse.msu.edu/%7Eweng/

research/LM.html) People Involved: J.Weng, Y. Zhang, W. Hwang

[Weng 2004] „Developmental Robotics: Theory and Experiments'' International Journal of Humanoid Robotics

[Weng 2002] „A Theory for Mentally Developing Robots,'' in Proc. 2nd International Conference on Development and Learning

[Weng 2004] A Theory of Developmental Architecture, in Proc. 3rd International Conference on Development and Learning (ICDL 2004)

37

Literature

Experiments [Zhang 2002] „Action Chaining by a Developmental Robot with a Value System,'' in Proc. 2nd

International Conference on Development and Learning, [Huang 2002] "Novelty and Reinforcement Learning in the Value System of Developmental

Robots," in Proc. Second International Workshop on Epigenetic Robotics [Zeng 2004] „Obstacle Avoidance through Incremental Learning with Attention Selection,'' in

Proc. IEEE Int'l Conf. on Robotics and Automation, [Zhang 2001] „Grounded Auditory Development by a Developmental Robot,'' in Proc. INNS/IEEE

International Joint Conference of Neural Networks 2001 (IJCNN 2001) Sensory Mapping

[Zhang 2002] „A Developing Sensory Mapping for Robots“, in Proc. 2nd International Conference on Development and Learning

[Weng 2003] „Candid Covariance-free Incremental Principal Component Analysis,'' IEEE Trans. Pattern Analysis and Machine Intelligence

Cognitive Mapping [Hwang 2000] „Hierarchical Discriminant Regression'', IEEE Trans. Pattern Analysis and Machine

Intelligence [Weng 2000] „An incremental learning algorithm with automatically derived discriminating

features'', in Proc. Asian Conference on Computer Vision

38

The End

Thank You !

Documents

1 Autonomous Developmental Learning Gerhard Neumann SS 2005