Upload
calvin-pitts
View
213
Download
0
Tags:
Embed Size (px)
Citation preview
A Multi-Domain Evaluation of Scaling in Soar’s Episodic Memory
Nate Derbinsky, Justin Li, John E. LairdUniversity of Michigan
Soar Workshop 2012 - Ann Arbor, MI 2
Motivation
Prior Work• Nuxoll & Laird (‘12): integration and capabilities• Derbinsky & Laird (‘09): efficient algorithms
Core QuestionTo what extent is Soar’s episodic memory effective and efficient for real-time agents that persist for long periods of time across a variety of tasks?
20 June 2012
Soar Workshop 2012 - Ann Arbor, MI 3
Approach: Multi-Domain Evaluation
• Existing agents from diverse tasks (49)– Linguistics, planning, games, robotics
• Long agent runs– Hours-days RT (105 – 108 episodes)
• Evaluate at each X episodes– Memory consumption– Reactivity for >100 task relevant cues
• Maximum time for cue matching <? 50 msec.20 June 2012
Soar Workshop 2012 - Ann Arbor, MI 4
Outline
• Overview of Soar’s EpMem• Word Sense Disambiguation (WSD)• Planning• Video Games & Robotics
20 June 2012
Soar Workshop 2012 - Ann Arbor, MI 5
Working Memory
Episodic MemoryProblem Formulation
20 June 2012
Representation• Episode: connected di-graph• Store: temporal sequence
Encoding/Storage• Automatic• No dynamics
Retrieval• Cue: acyclic graph• Semantics: desired features in context• Find the most recent episode that
shares the most leaf nodes in common with the cue
Episodic Memory
Encoding
Storage
Matching
Cue
Soar Workshop 2012 - Ann Arbor, MI 6
Episodic MemoryAlgorithmic Overview
Storage– Capture WM-changes as temporal intervals
Cue Matching (reverse walk of cue-relevant Δ’s)– 2-phase search
• Only graph-match episodes that have all cue features independently
– Only evaluate episodes that have changes relevant to cue features
– Incrementally re-score episodes20 June 2012
Soar Workshop 2012 - Ann Arbor, MI 7
Episodic MemoryStorage Characterization
20 June 2012
0 20 40 60 80 100 120 140 1600
1000
2000
3000
4000
5000
6000
f(x) = 27.8774973529299 x + 52.0287559751198R² = 0.982502738087084
Avg. Working Memory Changes
Avg.
Byt
es p
er E
piso
de
1 week, 8GB
1 day, 8GB
Soar Workshop 2012 - Ann Arbor, MI 8
Episodic MemoryRetrieval Characterization
Assumptions– Few changes per episode (temporal contiguity)– Representational re-use (structural regularity)– Small cue
Scaling– Search distance (# changes to walk)
• Temporal Selectivity: how often does a WME change• Feature Co-Occurrence: how often do WMEs co-occur within a single
episode (related to search-space size)
– Episode scoring (similar to rule matching)• Structural Selectivity: how many ways can a cue WME match an episode
(i.e. multi-valued attributes)
20 June 2012
Soar Workshop 2012 - Ann Arbor, MI 9
Word Sense DisambiguationExperimental Setup
• Input: <“word”, POS>; Output: sense #; Result– Corpus: SemCor (~900K eps/exposure)
• Agent– Maintain context as n-gram– Query EpMem for context• If success, get next episode, output result• If failure, null
20 June 2012
Accuracy First Second2-gram 14.57% 92.82%3-gram 2.32% 99.47%
Soar Workshop 2012 - Ann Arbor, MI 10
Word Sense DisambiguationResults
Storage– Avg. 234 bytes/episode
Cue Matching– All 1-, 2-, and 3-gram cues reactive– 0.2% of 4-grams exceed 50msec.
20 June 2012
Soar Workshop 2012 - Ann Arbor, MI 11
N-gram Retrieval ScalingRetrieval Time (msec) vs. Episodes (x1000)
20 June 2012
Feat
ure
Co
-Occ
urre
nce
Tem
pora
l Se
lecti
vity
0 500000 1000000 1500000 2000000 2500000 3000000 3500000 4000000 45000000
0.5
1
1.5
2
2.5
{be, say} (69){say, group} (6){friday, say} (1){say}
0 500000 1000000 1500000 2000000 2500000 3000000 3500000 4000000 45000000
5
10
15
20
25
{well, be, say}{friday, say, group}{friday, say}{say}
Soar Workshop 2012 - Ann Arbor, MI 12
PlanningExperimental Setup
• 12 automatically converted PDDL domains– Logistics, Blocksworld, Eight-puzzle, Grid, Gripper,
Hanoi, Maze, Mine-Eater, Miconic, Mystery, Rockets, and Taxi
– 44 distinct problem instances (e.g. # blocks)
• Agent: randomly explore state space– 50K episodes, measure every 1K
20 June 2012
Soar Workshop 2012 - Ann Arbor, MI 13
PlanningResults
Storage– Reactive: <12.04 msec./episode– Memory: 562 – 5454 bytes/episode
Cue Matching (reactive: < 50 msec.)1. Full State: only smallest state + space size (12)2. Relational: none3. Schema: all (max = 0.08 msec.)
20 June 2012
Soar Workshop 2012 - Ann Arbor, MI 14
Video Games & Mobile RoboticsExperimental Setup
• Hand-coded cues (per domain)20 June 2012
Domain Agent Duration Eval. Rate
TankSoar mapping-bot 3.5M 50K
Eaters advanced-move 3.5M 50K
Infinite Mario [Mohan & Laird ‘11] 3.5M 50K
Rooms World [Laird, Derbinsky & Voigt ‘11] 12 hours 300K
Soar Workshop 2012 - Ann Arbor, MI 15
Data: Eaters
20 June 2012
0 500000 1000000 1500000 2000000 2500000 3000000 350000005
101520253035404550
1234567
Episodes (x1 Million)
Retr
ieva
l Tim
e (m
sec)
813 bytes/episode
Soar Workshop 2012 - Ann Arbor, MI 16
Data: Infinite Mario
20 June 2012
0 500000 1000000 1500000 2000000 2500000 3000000 350000005
101520253035404550
1234567891011121314
Episodes (x1 Million)
Retr
ieva
l Tim
e (m
sec)
Structural Selectivity
2646 bytes/episode
Soar Workshop 2012 - Ann Arbor, MI 17
Data: TankSoar
20 June 2012
0 500000 1000000 1500000 2000000 2500000 3000000 350000005
101520253035404550 1
23456789101112131415Episodes (x1 Million)
Retr
ieva
l Tim
e (m
sec)
Co-Occurrence
1035 bytes/episode
Soar Workshop 2012 - Ann Arbor, MI 18
Data: Mobile Robotics
20 June 2012
0
10000000
20000000
30000000
40000000
50000000
60000000
70000000
80000000
90000000
100000000
11000000005
101520253035404550
123456
Episodes (x1 Million)
Retr
ieva
l Tim
e (m
sec) Temporal Selectivity
113 bytes/episode
Soar Workshop 2012 - Ann Arbor, MI 19
Summary of Results
Generality– Demonstrated 7 cognitive capabilities
• Virtual sensing, action modeling, long-term goal management, …
Reactivity– <50 msec. storage time for all tasks (ex. temporal discontiguity)– <50 msec. cue matching for many cues
Scalability– No growth in cue matching for many cues (days!)
• Validated predictive performance models
– 0.18 - 4 kb/episode (days – months)
20 June 2012
Soar Workshop 2012 - Ann Arbor, MI 20
Evaluation
Nuggets• Unprecedented evaluation of
general episodic memory– Breadth, temporal extent, analysis
• Characterization of EpMem performance via task-independent properties
• Soar’s EpMem (v9.3.2) is effective and efficient for many tasks and cues!
• Domains and cues available
Coal• Still easy to construct
domain/cue that makes Soar unreactive
• Unbounded memory consumption (given enough time)
20 June 2012
For more details, see paper in proceedings of
AAAI 2012