Upload
hope-carbonell
View
212
Download
0
Embed Size (px)
Citation preview
Learning To Use Memory
Nick Gorski & John LairdSoar Workshop 2011
2
Agent
Memory & Learning
Memory Environment
action
observations
reward
3
Agent
Actions, Internal & External
Environment
action
observations
reward
{go left, go right, eat food, bid 5 5s, pick a flower}
action
{store, retrieve, maintain}
Memory
5
Agent
Internal Reinforcement Learning
Environment
action
observations
reward
Memory
ReinforcementLearning
Action Selection
6
Assumptions
• Custom framework, not using Soar• Simple memory models• Simple tasks
7
Learning to Use Memory
• Research Question: – When can agents learn to use memory?
• Idea: – Investigate dynamics of memory and environment
independently• Need:– Simple, parameterized task
8
An Interactive TMaze
LEFT (observation)
{forward} (avail. actions)
9
An Interactive TMaze
DECIDE(observation)
{left, right} (avail. actions)
10
An Interactive TMaze
+1 (reward)
11
TMaze
A/B
C
Base TMaze
Question: how much knowledge is needed to perform this task?
12
Parameterized TMazes
A/B
C
Base TMaze Temporal Delay
A/B
C C
# Dependent Actions
D D
A/BX/Y
C
Concurrent Knowledge
A B W X Y Z
C
Amt. of Knowledge
A/B
C
2nd Order Knowledge
13
Two Working Memory Models
• Internal action toggles between memory states
• Less expressive• Ungrounded knowledge
• Internal action stores current observation
• More expressive• Grounded knowledge
01
Bit memory
toggleA/B
Gated WM
gate
14
TMaze
A/B
C
Base TMaze
15
Bit Memory & TMaze
• Methodology:– Modify memory to
attribute blame• Interfering behavior
in choice location• Doesn’t manifest
with GWM
16
State Diagram: Bit Memory TMaze
L L 1
true percept mem
L L 0
true percept mem
R R 1
true percept mem
R R 0
true percept mem
L C 1
true percept mem
R C 1
true percept mem
R C 0
true percept mem
L C 0
true percept mem
toggle
STARTING STATES
up upup up
left right
left right left right
toggle
toggle toggleleft right
17
State Diagram: GWM TMaze
L L L
true percept mem
L L
true percept mem
R R
true percept mem
R R R
true percept mem
L C L
true percept mem
L C
true percept mem
R C
true percept mem
R C R
true percept mem
L C C
true percept mem
R C C
true percept mem
gate
STARTING STATES
gate
up upup up
gate gategate gateleft right
left right left right
left right
18
Number of Dependent Actions
C
# Dependent Actions
D D
19
What We’ve Learned
• Our machine learning intuition is often wrong (and yours probably is, too!)
• Chicken & Egg Problem• State ambiguity is very problematic to learning
20
Chicken & Egg Problem
• Prospective uses of memory are hard• Case study: bit memory & TMazes
1/0
Bit memory
• Endemic across memory models
A/B
C
Base TMaze Chicken & Egg Problem:• Must learn an association between 1 & 0 and A & B• Must learn an association between 1 & 0 and left & right• To be effective, can’t self- interfere with memory in C!
21
Implications for Soar
• Soar natively supports learning internal acts.• Next step: learning to use Soar’s memories• Learning alongside hand-coded procedural
knowledge is potentially strong approach• Soar got the WM model right• RL will never be a magic bullet
22
Nuggets & Coal
• Nearly finished!• Better understanding of
RL + memory, and thus Soar 9
• Parameterized, empirical evaluations of RL gaining traction
• Optimality not only metric of performance
• Not quite finished!• Qualitative results, but
no closed form results yet
• No recent results for long term memories
• Not immediately applicable to Soar