Upload
avis-houston
View
220
Download
1
Tags:
Embed Size (px)
Citation preview
2
Outline of the talk
General Theory and Philosophy Architecture
Sensory Mapping Cognitive Mapping Sensorimotor System
Experiments and Results Reinforcement Learning Action Chaining
3
Developmental Robotics
Investigates Models coming from developmental psychology or developmental neuroscience
Applying insights from studies on ontogenic neuroscience
Many different studies about Social interaction [Fong 2003] Sensorimotor control [Weng 2004, Metta 2003] Categorization [Pfeifer 1999] Value Systems [Pfeifer 1999, Sporns 2002] Morphological changes and motor skill acquisition
[Lungrella 2002] Discussed in this talk: Autonomous Mental
Development (AMD) [Weng 2001] as one approach for Developmental Robotics
4
Machine Development Paradigms
Manual development Given: Task T and ecological conditions Ec
Human developer H understands Task T and programs the agent: A = H(T, Ec)
Task specific architecture, representation and skills are developed by human hands
Autonomous development Given: ecological conditions Ec, the task is unkown
Internal representation can not be predefined Human developer H writes a task-non specific developmental
program for the newborn agent: A(0) = H(Ec) The task has to be understood by the agent itself
After Birth Human teachers can affect the behavior of the robot by: Supervised learning Reinforcement learning Communicative learning
5
SASE Agents
Self-Aware and Self-Effecting Agent: Has additionally an internal
environment (the brain) Internal sensors and internal
effectors in addition to external sensors and effectors E.g. attention control and action
realease are internal actions All conscious internal actions have
coresponding internal sensors Internal and external environments
are used for perception and cognition
6
Internal Representation
Symbolic Representation (Traditional AI): world-centered : Describes an object in the external world
with an unique predefined set of attributes. Each component in the representation has a predefined
meaning. Distributed Representation (AMD):
Body centered: Grown from the body‘s sensors and effectors
Vector form: A = (v1, v2, …, vn) consisting of sensory input and motor control output (or a function of both).
Representation of an object is distributed over many cortical areas.
Used by developmental programs because the task is unknown at programming time.
7
Architecture: past and future contexts
System receives last context as the input vector: l(t) = <xl(t), al(t)>
xl(t), al(t) : last sensation and last action Include internal sensors and actions.
8
Architecture: Primed Contexts
System predicts the primed (future) context: p(t) = <xp(t), ap(t), Q(xp(t), ap(t))> Not sufficient to predict one primed context
There might be multiple future possibilities Reality mapping R : {p1(t), p2(t), …, pk(t)} = R(l(t)) Value system V : selects a desirable context
pi(t), based on the Q-Value Second Mapping F for the far future. R und F are developed incrementally
through experience
9
Sensory Mapping: Staggered Hierarchical Mapping
High dimensional sensory input Visual: Images 100 x 100 : 10000 Auditory: 300 – 1000
Appropriate (autonomous) feature extraction is needed
Inspired by human early visual pathways Uses incremential PCA in receptive fields Apply filters to the receptive fields
Each Filter is given by the eigenvectors of the PCA calculation
10
Sensory Mapping: Staggered Receptive Fields
Non Overlapping : Many Filters for one
receptive field (RF) All eigen-vectors are used
for one RF Low resolution
Overlapping (Staggered) One filter (a specified
eigenvector) per RF Tradeoff between resolution
and the dimension of the feature space
11
Sensory Mapping: Layered Structure
Neurons of each layer are organized in a 2-D array (resembles the structure of images)
For each neuron of layer k localized connections are applied to the neurons of layer k-1
At any position with any scale, a neuron can be found whose receptive field approximately covers the region.
12
Sensory Mapping: CCIPCA
Candid Covariance-Free Incremental Principal Component Analysis Standard PCA is a batch method, not applicable for
developmental learning Usually very high dimensional input, can not compute
covariance matrix in real time Standard PCA:
Computes eigen-directions (direction of maximal variance) of the sample data
Eigen-directions are the eigenvectors of the covariance matrix A = E[u(t) uT(t)] (u(t) … zero mean sample distribution)
13
Sensory Mapping: Incremental PCA
Calculate 1st eigenvector
Converges to the eigenvector with the highest eigenvalue
No convergence if we have equal eigenvectors High Order Eigenvectors
Substract the projection of the data sample with the lower order eigenvectors
Apply the same algorithm as for the first eigenvector
n
i
T
iv
iviuiu
niv
Axxv
1 )1(
)1()()(
1)(
)(
)(
)(
)()()()(
1
1
1
1112 iv
iv
iv
iviuiuiu
14
Sensory Mapping: Eigengroups
Eigengroup of layer k is defined as n x n n … maximum distance between two neurons to have
their input region overlap Define the number of the eigenvector for each neuron
(filter) in a eigengroup Usually this ordering is the same for all eigengroups in a
layer Calculate the eigenvectors incrementially with CCIPCA in
the predefined order and substract the projection of the data with the coresponding eigenvector
Inhibition of nearby neurons: detect different statistically uncorrelated features
Output: Product of the input vector with the eigenvector Apply a sigmoidal function
15
Sensory Mapping: Eigengroups
Eigengroups: Sharing Method: a single set of filters for all eigengroups Applied to 5000 natural images First several filters are similar to biological receptive field patterns
16
Sensory Mapping: Selective Attention
Each sensory mapping unit has internal attention effectors
Layered structure: Can not define clear-cut attended region in the input
space Attented Region: 3-D ellipsoid centered at (x,y,l), l is
the layer. Experiments with Occlusion
Occlude either the upper or lower half of face images
Own SHMs were used for the different occlusions
Outperforms the approach without attention control significantly
17
Cognitive Mapping: Incremental Hierarchical Discriminant Regression (IHDR)
Cognitive Mapping:
X : space of last contexts Y : space of primed contexts
Find discriminant features in input space High dimensional input space
Classical decision trees are not applicable
Modelled by an Hierarchical Discriminant Regression Tree
YXf :
18
Cognitive Mapping: IHDR Tree
Each node contains: x-clusters and coresponding y-cluster
y-clusters determine the virtual class labels Defines to which cluster pair the example (x,y) belongs
x-clusters approximates the sample population in the X-space Maximal q clusters of each type per node
Spawn a child node from the current node if a finer approximation is required
None of the clusters keep actual input samples, only first order statistics are used
19
Cognitive Mapping: HDR Tree
Build tree for a set of samples S: Cluster the y-vectors into p clusters Assign each example to the nearest y-cluster Calculate mean and covariance matrix of each x-cluster Reassign each example to the nearest x-cluster If the y-labels of the examples one cluster (S‘) differ
significantly, create a new node and recursively build the tree (with subset S‘ as input).
Retrieval: Calculate probabilistic-based distances to each cluster of
a node Always continue the search at the k nearest clusters
20
Cognitive Mapping
The deeper a node is in the tree, the smaller is the variance Gaussian distributions: hierarchical version of mixture of gaussian
distribution models Calculate the distance to the clusters:
Euclidian Distance Mahalanobis Distance (single covariance matrix) Gaussian Distance (individual covariance matrices) The choice of the distance measure is based on the number of samples
provided for the corresponding cluster
21
Cognitive Mapping: Distance Measure
Mahalanobis Distance, Gaussian Distance: Estimate of the cov. matrices is needed Impossible for high-dimensional input spaces Computations are done in the discriminant
space D, not in X D calculated by Fisher‘s linear discriminant analysis
(LDA) LDA calculates the best discriminating space for a K-
label classification problem For q clusters, we get a q-1 dimensional discriminant
space D Distance calculations and hence the calculations of the
covariance matrices are done in the space D
22
Sensorimotor System: Level Building Element (LBE)
S: Spatial Sensory Mapping T: Spatiotemporal Sensory Mapping Each internal and external action output feeds back into the sensory
input M: Motor mapping generates concise representations for
stereotyped actions, selects primed context with the highest confident index
23
Sensorimotor System: Level Building Element (LBE)
Priority Updating Queue (PUQ) used for the far future predictions F At every time instant, put the selected primed context p(t) in
the PUQ, remove the oldest entry. Update each entry in the Queue (beginning with the newest
entry)
Inspired by the Q-Learning Algorithm
Information embedded in the future primed context p(t+1)is back-propageted into earlier primed contexts. F can be seen as average future context
)()1(1
)()( )1()1()1()( itpitpn
litpitp nnnn
)()1()(1
)()( )1()1()1()( itQitQtrn
litQitQ nnnn
24
Sensorimotor System: Multilevel Architecture
Low-level architecture uses fine time steps Higher level can become
more abstract Use low-level primed
context as input for the high-level LBE
Same architecture is used for the different levels
Use the levels for different sensory integration (vision, audio…)
25
AMD: Teaching the Robot
Internal Representation of the Robot can not be accessed at after creation
Supervised Learning: Human imposes action by buttons or directly manipulating the
robot Set the value of an imposed action to a high value
Reinforcement Learning: Human gives rewards for the action (good = 1, bad = -1) by two
buttons Cummunicative Learning:
Desired Action Wether the current action is good Rules to follow Criterea to judge right and wrong
26
Experiments: Used Robots
SAIL: Single robot arm,
wheele driven 13 DOF
DAV: Wheele driven base,
humanoid torso 43 DOF
Sensors: Stereo-cameras, microphones, laser range scanner, touch sensors
27
Experiment: Vision-guided navigation
Indoor navigation task using SAIL Human teacher navigated robot through
corridors: supervised learning After 4 trips robot navigated autonomously,
teacher had to hand push in certain situations
After 10 trips the robot managed to navigate without help
Experiment was repeated outdoors with limited success
28
Experiment: Learning from Novelty and Rewards
Define Novelty Measure Difference between the primed sensation xp(t) and the
actual sensation xl(t)
R(t) : reward given from a human Actual reward : Q-Values learnt by Prototype Update Queue
Single Level System is used
m
j j
jjp txtx
mtn
1
))1()((1)(
)()1()()( tntRtr
29
Experiment: Learning from Novelty and Rewards
3 actions: Stay at current view Look right (30 °) Look left (30 °) 7 absolute viewing positions
Sensory Input: Simulation: 100 x 100 image SAIL: 40 x 30 x 3 x 2 images
Experiments: Habituation Effect: Startet with initial positive Q-Value for stay at
current scene, after a while roboter becomes bored Integration of novelty and immediate reward:
Positive reward for turning left, otherwise negative => always turns left
Moving Toy added to the environment in viewing angle 0 => Stay in this position
30
Experiments: Speech learning
Uses the same developmental architecture Auditory streams have not been segmented or labeled During learning, the entire system must listen to everything No gramatical syntax is envolved
Word Recoginition Numbers from 1 to 10
63 persons, 5 utterances per digit (3150 examples) 4 layers in the sensory Mapping Module Input: 13th order Mel-frequency Cepstral Coefficients (MFCCs) Supervised learning
31
Experiments: Speech learning
Selective Attention: Two layers in the sensory mapping module
Different temporal integration Attention Control: Choose from one of these layers Learned through Reinforcement Learning
Attention is learned according to each word, speaker and utterance
Results after 10 Epochs: Tail of one and seven are quite similiar
=> Take 2nd layer
32
Experiment: Action Chaining
Action Chaining:
CC, CS1, CS2 : Voice commands AS1, AS2 : Actions Conditioning Problem
Multi Level LBE are used Pure reinforcment learning does not work well due to the lack of generalization 2nd level gets the averaged version of the future context as input (the F context)
=> generalization over the current context
212211 ssCssssC AACACACC
33
Experiment: Action Chaining
Reinforcement Learning in the two levels
Lower Level proposes action, action is only executed if Q2 > 0, otherwise no action is executed
Experiment: 4 primitive actions:
Behavior establishment: Supervised Learning Action Chaining: „Start“, „one“, „two“, „three“, „four“
Success: „Start“ -> execute actions Experiment was repeated 20 times
)()1()(1
)()( )1(2
)1(2
)1(2
)(2 itQitQtr
n
litQitQ nnnn
)()1()()(1
)()( )1(1
)1(1
)1(2
)1(1
)(1 itQitQtQtr
n
litQitQ nnnnn
34
Experiment: Range-Based Navigation
Input: Laser Scanner 360 laser rays (0.5° resolution) Programmed Attention control:
If all readings are larger than treshold T, no special attention is needed because all objects are far away
If some readings are lower than T, only pass this readings, replace the other values through the average value.
Simulation experiments: Supervised learning 16 scenarios, 1157 examples With attention control: performed successfully a 5 minutes run
Result was also tested on the Dav robot 15 minutes run in a crowded corridor without collosion
35
Summary/Conclusion
New area in robotic where the task is not given to the developer No human bias Can deal with uncontrolled environments, automatically
extendable Human can only adjust the behavior by teaching
Only way to control robots in unknown domains Good methods for dealing with high dimensional
input Problematic to apply to highly accurate control tasks
(humanoid robots)
36
Literature
General Theory [Lungrella 2004] „Beyond Gazing, Pointint and Reachng, A Survey of Developmental Robotics“ [Fong 2003] „A Survey of socially interactive robots“, Robotics and Autonomous systems [Metta 2000] „Babybot: A study Into sensorimotor development“, PhD Thesis [Pfeifer 1999] „Understanding Intelligence“, MIT Press [Sporns 2002] „Embodied cognition“, MIT Handbook of brain Theory and Neual Networks [Lungrella 2003] „Learning to bounce: first lesson from a bouncing robot“, In Proc. Of the 4th Int.
Conference on Simulation of Adaptive Motion in Animals and Machines Autonomous Mental Development (Papers found on http://www.cse.msu.edu/%7Eweng/
research/LM.html) People Involved: J.Weng, Y. Zhang, W. Hwang
[Weng 2004] „Developmental Robotics: Theory and Experiments'' International Journal of Humanoid Robotics
[Weng 2002] „A Theory for Mentally Developing Robots,'' in Proc. 2nd International Conference on Development and Learning
[Weng 2004] A Theory of Developmental Architecture, in Proc. 3rd International Conference on Development and Learning (ICDL 2004)
37
Literature
Experiments [Zhang 2002] „Action Chaining by a Developmental Robot with a Value System,'' in Proc. 2nd
International Conference on Development and Learning, [Huang 2002] "Novelty and Reinforcement Learning in the Value System of Developmental
Robots," in Proc. Second International Workshop on Epigenetic Robotics [Zeng 2004] „Obstacle Avoidance through Incremental Learning with Attention Selection,'' in
Proc. IEEE Int'l Conf. on Robotics and Automation, [Zhang 2001] „Grounded Auditory Development by a Developmental Robot,'' in Proc. INNS/IEEE
International Joint Conference of Neural Networks 2001 (IJCNN 2001) Sensory Mapping
[Zhang 2002] „A Developing Sensory Mapping for Robots“, in Proc. 2nd International Conference on Development and Learning
[Weng 2003] „Candid Covariance-free Incremental Principal Component Analysis,'' IEEE Trans. Pattern Analysis and Machine Intelligence
Cognitive Mapping [Hwang 2000] „Hierarchical Discriminant Regression'', IEEE Trans. Pattern Analysis and Machine
Intelligence [Weng 2000] „An incremental learning algorithm with automatically derived discriminating
features'', in Proc. Asian Conference on Computer Vision