On Linking Reinforcement Learning with Unsupervised Learning Cornelius Weber, FIAS presented at...

On Linking Reinforcement Learningwith Unsupervised Learning

Cornelius Weber, FIAS

presented at Honda HRI, Offenbach, 17th March 2009

for taking action, we need only the relevant features

unsupervisedlearningin cortex

reinforcementlearning

in basal ganglia

state spaceactor

Doya, 1999

state space

1-layer RL model of BG ...

go left?

go right?... is too simple to handle complex input

complex input(cortex)

need another layer(s) to pre-process complex data

feature detection

action selection

state space

models’ background:

- gradient descent methods generalize RL to several layers Sutton&Barto RL book (1998); Tesauro (1992;1995)

- reward-modulated Hebb Triesch, Neur Comp 19, 885-909 (2007), Roelfsema & Ooyen, Neur Comp 17, 2176-214 (2005); Franz & Triesch, ICDL (2007)

- reward-modulated activity leads to input selection Nakahara, Neur Comp 14, 819-44 (2002)

- reward-modulated STDP Izhikevich, Cereb Cortex 17, 2443-52 (2007), Florian, Neur Comp 19/6, 1468-502 (2007); Farries & Fairhall, Neurophysiol 98, 3648-65 (2007); ...

- RL models learn partitioning of input space e.g. McCallum, PhD Thesis, Rochester, NY, USA (1996)

sensory input

reward

action

scenario: bars controlled by actions, ‘up’, ‘down’, ‘left’, ‘right’;

reward given if horizontal bar at specific position

model that learns the relevant features

top layer: SARSA RL

lower layer: winner-take-all feature learning

both layers: modulate learning by δ

RL weights

featureweights

action

SARSA with WTA input layer

note: non-negativity constraint on weights

Energy function: estimation error of state-action value

identities used:

RL action weights

feature weights

learning the ‘short bars’ data

reward

action

short bars in 12x12 average # of steps to goal: 11

RL action weights

feature weights

input reward 2 actions (not shown)

learning ‘long bars’ data

WTAnon-negative

weights

SoftMaxnon-negative

weights

SoftMaxno weight

constraints

Discussion

- simple model: SARSA on winner-take-all network with δ-feedback

- learns only the features that are relevant for action strategy

- theory behind: derivation of value function estimation (approx.)

- non-negative coding aids feature extraction

- link between unsupervised- and reinforcement learning

- demonstration with more realistic data needed

Bernstein FocusNeurotechnology,BMBF grant 01GQ0840

EU project 231722“IM-CLeVeR”,call FP7-ICT-2007-3

Frankfurt Institutefor Advanced Studies,FIAS

On Linking Reinforcement Learning with Unsupervised Learning Cornelius Weber, FIAS presented at...

Documents

Fias Accreditation certificates

Ágnes Mócsy FIAS & ITP, Frankfurt

Hri Slideshow 3

HRI-Mei - Bermad · The HRI-Mei is a data capture device for Meistream bulk water meters. All MeiStream registers are prepared to receive the HRI-Mei. The HRI-Mei can also be mounted

Offenbach, 01. July 2011

Hadrons@FAIR, FIAS June 26th, 2008

Teknik Fias-flame (2)

les contes JACQUES OFFENBACH d’hoffmann · the muse of poetry ... Jacques Offenbach (1819–1880) was born Jacob Offenbach in Cologne, Germany, of Jewish ancestry. In 1833, he moved

metodo FIAS Enva100

MTSO MSS EDZW := MTSO Meteorological Telecommunication System Offenbach Bernd Richter, RTH Offenbach

Jacques Offenbach - Influx Sheet Music · Jacques Offenbach Entre'Acte et Barcarolle for Chorus and Orchestra Full Score sample score

FIAS Presentation CRF 2009

Fias Co Farm- Goat Medications

Michael Baldauf Deutscher Wetterdienst, Offenbach, Germany

Page 1 of 47cercind.gov.in/2012/orders/Chairperson _Member_VS_.pdf · s Present: hri Anil Sha hri Krishna hri Prabha hri Abhuma s. S. Usha hri S. K. So hri S. R. N s. Jyoti Pra hri

Altland HRI Symposium

lecture6 - hri-lecture - Computer Action Teamweb.cecs.pdx.edu/~mperkows/CLASS_479/S2006/hri-lecture.pdf · Prof. Yanco 91.451 Robotics II, Spring 2005 HRI Lecture, Slide 23 Teammate

HRI Report 2_HFMSP

FIAS ANTI-DOPING PROVISIONS 2010 - sambo.com · 3 FIAS Anti-doping PROVISIONS Version 3 Adopted by FIAS Executive Committee Burro on the 20th of May’2010 INTRODUCTION: FIAS ANTI-DOPING

10 Ideas HRI