Studies on Goal-Directed Feature Learning Cornelius Weber, FIAS presented at:

Studies on Goal-Directed Feature Learning

Cornelius Weber, FIAS

presented at:“Machine Learning Approachesto Representational Learning

and Recognition in Vision”

Workshop at the Frankfurt Institute for Advanced Studies (FIAS),November 27-28, 2008

for taking action, we need only the relevant features

x

y

z

models’ background & overview:

- unsupervised feature learning models are enslaved by bottom-up input

- reward-modulated activity leads to input selection Nakahara, Neur Comp 14, 819-44 (2002)

- reward-modulated STDP Izhikevich, Cereb Cortex 17, 2443-52 (2007), Florian, Neur Comp 19/6, 1468-502 (2007); Farries & Fairhall, Neurophysiol 98, 3648-65 (2007); ...

- RL models learn partitioning of input space e.g. McCallum, PhD Thesis, Rochester, NY, USA (1996)

- reward-modulated Hebb Triesch, Neur Comp 19, 885-909 (2007), Roelfsema & Ooyen, Neur Comp 17, 2176-214 (2005); Franz & Triesch, ICDL (2007)

(model 3 presented here, extending to delayed reward)

- feature-pruning models learn all features but forget the irrelevant ones (models 1 & 2 presented here)

sensory input

reward

action

purely sensory data, in which one feature type is linked to reward

the action is not controlled by the network

model 1: obtaining the relevant features

1) build a feature detecting model

2) learn associations between features

3) register the average features’ reward

4) spread value along associative connections

5) check whether actions in-/decrease value

6) remove features where action doesn’t matter

irrelevant relevant

Földiák, Biol Cybern 64, 165-70 (1990)

→ homogeneous activity distr.

features

thresholds

late

ral w

eigh

ts (

deco

rrel

atio

n)

selectedfeatures

asso

ciat

ive

wei

ghts

actioneffect

Weber & Triesch, Proc ICANN, 740-9 (2008);Witkowski, Adap Behav, 15(1), 73-97 (2007);Toussaint, Proc NIPS, 929-36 (2003);Weber, Proc ICANN, 1147-52 (2001)

→ relevant features indentified

sensory input reward

motor-sensory data (again, one feature type is linked to reward)

the network selects the action (to get reward)

irrelevantsubspace

relevantsubspace

model 2: removing the irrelevant inputs

1) initialize feature detecting model

(but continue learning)

2) perform actor-critic RL, taking the features’

outputs as state representation

- works despite irrelevant features

- challenge: relevant features will occur

at different frequencies

- nevertheless, features may remain stable

3) observe the critic: puts negative value

on irrelevant features after long training

4) modulate (multiply) learning by critic’s value

frequency

value

Lücke & Bouecke, Proc ICANN, 31-7 (2005)

features

criticvalue action weights

→ relevant subspace discovered

model 3: learning only the relevant inputs

1) top level: reinforcement learning model (SARSA)

2) lower level: feature learning model (SOM / K-means)

3) modulate learning by δ, in both layers

RL weights

featureweights

input

action

model 3: SARSA with SOM-like activation and update

relevantsubspace RL action weights

subspacecoverage

feature weights

RL action weights

feature weights

input reward 2 actions (not shown)

data

learning ‘long bars’ data

RL action weights

feature weights

input data:bars controlled by actions‘up’, ‘down’, ‘left’, ‘right’

learning the ‘short bars’ data

reward

action

short bars in 12x12 average # of steps to goal: 11

cortex

striatum

GPi (output of basal ganglia)

biological interpretation

- no direct feedback from striatum to cortex

- convergent mapping → little receptive field overlap, consistent with subspace discovery

feature/subspace detection

action selection

Discussion

- models 1 and 2 learn all features and identify the relevent features

- either requires homogeneous feature distribution (model 1)

- or can do only subspace- (no real feature) detection (model 2)

- model 3 is very simple: SARSA on SOM with δ-feedback

- learns only the relevant subspace or features in the first place

- link between unsupervised- and reinforcement learning

Sponsors

BernsteinFocusNeurotechnology

EU project 231722“IM-CLeVeR”call FP7-ICT-2007-3

Frankfurt Institutefor Advanced StudiesFIAS

early learning late learning

Jog et al,Science, 286,1158-61 (1999)

relevant features change during learning

units in the basal ganglia are active at the junctionduring early task acquisition but not at a later stage

T - maze decision task (rat)

evidence for reward/action modulated learning in the visual system

Shuler & Bear, "Reward timing in the primary visual cortex", Science, 311, 1606-9 (2006)

Schoups et al. "Practising orientation identification improves orientation coding in V1 neurons" Nature, 412, 549-53 (2001)

Documents

Studies on Goal-Directed Feature Learning Cornelius Weber, FIAS presented at: