1
playing catch catch procedure throw procedure ick tail appropriate direction and velocity tail IR sensors not predicting success ball predictions not active suceeds when ball rolls towards partner terminates when ball not near tail not near ball rolls towards succeeds when tail surrounds ball tail surrounds ball IR sensors in tail are low ball predictions are active terminates when ball position unknown or unreachable ball unreachable ball position unknown no known procedure can lead to ball predictions no procedure reliably leads to ball predictions position to intersect ball ball partner another agent predictions about interactions play reward for learning less chance of negative reward than competition object predicting interactions rolls when I bump into it looks green temporal coherence to predictions Bridging the Implementation Gap From Sensorimotor Experience to Conceptual Knowledge Anna Koop, Leah Hackman, Richard S. Sutton Verifiable knowledge is ABOUT sensorimotor data The RL Perspective agent environment m s The reinforcement learning agent is an input-output system, interacting with an environment that is only accessible via sensation and motor signals. ...m t-2 s t-2 m t-1 s t-1 m t s t m t+1 s t+1 m t+2 s t+2 ... current future past The Gap sensorimotor data conceptual knowledge temporal, dynamic atemporal, (more) static shareable, objective individual, subjective detailed, situated abstract, general Verifiable Signals At every timestep the agent receives a sensor signal and sends a motor signal. Experience is made up of past, present, and potential future sensorimotor data. s t h c p x t-n x t Historic Predictive x t+n x t Compositional x t We can construct signals that abstract over time and data that are still verifiable statements about sensorimotor experience. Different Views We see the Critterbot playing catch with a ball. The Critterbot sees various sensor and motor signals. R I A L &

Anna Koop, Leah Hackman, Richard S. Suttonannakoop.com/papers/KoopHackmanSutton_2010...Poster.pdf · Anna Koop, Leah Hackman, Richard S. Sutton Veri˜able knowledge is ABOUT sensorimotor

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Anna Koop, Leah Hackman, Richard S. Suttonannakoop.com/papers/KoopHackmanSutton_2010...Poster.pdf · Anna Koop, Leah Hackman, Richard S. Sutton Veri˜able knowledge is ABOUT sensorimotor

playing catch

catch procedure throw procedure

flick tail appropriate

direction and velocity

tail IR sensors not predicting success

ball predictions not active

suceeds when ball rolls

towards partner

terminates when ball not near tail

not near ballrolls towards

succeeds when tail surrounds ball

tail surrounds ball

IR sensors in tail are low

ball predictions are active

terminates when ball position unknown or unreachable

ball unreachable ball position unknown

no known procedure can lead to ball predictions

no procedure reliably leads to ball predictions

position to intersect ball

ball partner

another agent

predictions about

interactions

play

reward for learning

less chance of negative rewardthan competition

object predicting interactions

rolls when I bump into it looks greentemporal coherence to predictions

Bridging the Implementation GapFrom Sensorimotor Experience to Conceptual Knowledge

Anna Koop, Leah Hackman, Richard S. Sutton

Veri�able knowledge is ABOUT sensorimotor data

The RL Perspective

agent environment

m

s

The reinforcement learning agent is an input-output system, interacting with an environment that is only accessible via sensation and motor signals.

...m t-2s t-2m t-1s t-1 m ts t m t+1s t+1m t+2s t+2...

current futurepast

The Gapsensorimotor data conceptual knowledge

temporal, dynamic atemporal, (more) static

shareable, objectiveindividual, subjective

detailed, situated abstract, general

Veri�able Signals

At every timestep the agent receives a sensor signal and sends a motor signal. Experience is made up of past, present, and potential future sensorimotor data.

s t

h

c

p

xt-n xt

Historic Predictivext+nxt

Compositionalxt

We can construct signals that abstract over time and data that are still veri�able statements about sensorimotor experience.

Different Views

We see the Critterbot playing catch with a ball.

The Critterbot sees various sensor and motor signals.

RIAL

&