View
259
Download
0
Category
Tags:
Preview:
DESCRIPTION
Honors Thesis Project
Citation preview
I give permission for public access to my thesis and for any copying
to be done at the discretion of the archives librarian and/or the College
librarian.
Surabhi Gupta ’11
Abstract
The brain is incredibly efficient at storing and searching through large
environments. While we have amassed a burgeoning amounts of data on
the brain regions and neuronal mechanisms involved, we do not under-
stand the framework and processes underlying the cognitive map. This
thesis proposes a novel model of pathfinding using distributed representa-
tions in a hierarchical framework. Compositional rules for generations of
higher-scale locations are recursively defined using Holographic Reduced
Representations [Plate, 1991]. Pathfinding is based on hierarchical search
across the levels of the hierarchy. Locations are retrieved using autoasso-
ciative recall. This model has many salient features and biologically real-
istic features such as automatic generalization, efficient scale-wise search,
robustness and graceful degradation in larger environments.
To test the paradigm of hierarchical spatial reasoning in the brain, I de-
signed and conducted from an event-related potential (ERP) experiment.
During the training phase, the participants were allowed to explore a vir-
tual environment and were encouraged to visualize and remember paths
between different landmarks. During the experiment, subjects are given
a task followed by three spatial maps which appear in succession. For
each map, they are asked to indicate whether it is most relevant to the
path-finding task. Significant differences were found between the evoked
potentials in various conditions which point towards the saliency of a hi-
erarchical representation.
Using Holographic Reduced Representations to Model
Hierarchical Spatial Reasoning
by
Surabhi Gupta
Prof. Audrey Lee-St. John
(Research Advisor)
A thesis submitted in partial fulfillment
of the requirements for the
Degree of Bachelor of Arts with Honors
in Computational Neuroscience.
Mount Holyoke College
South Hadley, Massachussetts
30th April, 2011
Acknowledgements
I would like to express my sincere gratitude to my thesis advisor professor
Audrey, for her unwavering support, patience, inspiration, and knowledge,
for helping me navigate my thesis through the interdisciplinary field of
computational neuroscience. I thank professor Dave Touretzky who pro-
vided the original formulation of the hierarchical navigation problem as an
HRR-based associative retrieval problem and motivating me to pursue it.
I am thankful to Professor Lee Bowie and Lee Spector for insightful
discussions and hard questions, Prof. Paul Dobosh, Prof. Joe Cohen,
Prof. Gary Gillis, Prof. Jane Couperus, Prof. Barbara Lerner, Prof. Lisa
Ballesteros and Prof Sue Barry for their support and encouragement.
I thank Professor Tai Sing Lee for the opportunity to participate in
the Program in Neural Computation at Carnegie Mellon University. I
thank my labmates Anoopum Gupta, Brian Gereke, Timothy Carroll and
Melanie Cox for stimulating discussions. I thank the Center for Neural
Basis of Cognition for funding my summer project.
I’m grateful to the Computer Science and Neuroscience departments at
MHC and the Cognitive Science department at Hampshire for supporting
me in this endeavor.
Last but not the least I would like to thank my parents Dr. Shailendra
Kumar Gupta and Mrs. Padma Gupta for raising me, for their teachings
and for their unconditional support.
Contents
1 Introduction 10
1.1 Discussion of Spatial Knowledge . . . . . . . . . . . . . . . 11
1.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.4 Structure of Thesis . . . . . . . . . . . . . . . . . . . . . . 15
2 Background & Preliminaries 16
2.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.1.1 Neuroscience background . . . . . . . . . . . . . . . 17
2.1.2 Computer Science background . . . . . . . . . . . . 18
2.1.3 Connectionism . . . . . . . . . . . . . . . . . . . . 19
2.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.2.1 Distributed Representations . . . . . . . . . . . . . 23
2.2.2 Autoassociative Memory . . . . . . . . . . . . . . . 25
2.2.3 Hierarchical Spatial Reasoning . . . . . . . . . . . . 26
3 Pathfinding Framework and Process 28
3.1 Pathfinding Framework . . . . . . . . . . . . . . . . . . . . 29
1
2
3.1.1 Hierarchical Composition of locations . . . . . . . . 30
3.1.2 Auto-associative Memory . . . . . . . . . . . . . . . 32
3.2 Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.2.1 Checking for the Goal . . . . . . . . . . . . . . . . 37
3.2.2 Retrieving the next Scale . . . . . . . . . . . . . . . 39
3.2.3 Retrieving the Path . . . . . . . . . . . . . . . . . . 40
3.3 Features of the Algorithm . . . . . . . . . . . . . . . . . . 43
3.4 Analyzing the Process . . . . . . . . . . . . . . . . . . . . 44
4 Extension to Continuous Domain 47
4.1 Extension to Framework . . . . . . . . . . . . . . . . . . . 48
4.1.1 Autoassocaitive Memories . . . . . . . . . . . . . . 49
4.2 Extension to Process . . . . . . . . . . . . . . . . . . . . . 50
4.2.1 Checking for the Goal . . . . . . . . . . . . . . . . 51
4.2.2 Goal Retrieval . . . . . . . . . . . . . . . . . . . . . 51
5 Experiments and Results 52
5.1 Experiments on the Process . . . . . . . . . . . . . . . . . 52
5.2 Autoassociative Memory . . . . . . . . . . . . . . . . . . . 55
5.2.1 Neural Net . . . . . . . . . . . . . . . . . . . . . . 56
5.2.2 Hopfield Net . . . . . . . . . . . . . . . . . . . . . . 61
6 Event Related Potential Experiment 64
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 64
6.2 Methods and Procedures . . . . . . . . . . . . . . . . . . . 65
6.2.1 Training Phase . . . . . . . . . . . . . . . . . . . . 65
3
6.2.2 Experiment Phase . . . . . . . . . . . . . . . . . . 67
6.3 Results and Discussion . . . . . . . . . . . . . . . . . . . . 69
7 Conclusions and Future Work 74
A Derivation of δ(X,A) 77
B Pathfinding Example 79
B.1 Binary Framework Example . . . . . . . . . . . . . . . . . 79
B.2 Continuous Framework Example . . . . . . . . . . . . . . . 86
B.3 Dimensionality . . . . . . . . . . . . . . . . . . . . . . . . 92
List of Figures
2.1 Cortical hierarchy depicting the formation of invariant rep-
resentations in hearing, vision and touch. Reprinted from
“On Intelligence” [Hawkins and Blakeslee, 2004] . . . . . . 18
2.2 Hierarchical Representation of the Environment. The top-
most node is the root node and the three nodes at the lowest
scale are the leaf nodes. Values for depth and scale are in-
dicated on the left. Height of the tree is the depth of a leaf
node . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.1 A sample hierarchical framework depicted in graphical form.
Each location at higher scale is composed of the constituting
locations at lower scales. Indexes are indicated next to the
scale number. Note: The edges do not encode any feature
of the spatial environment such as distances etc; they are
merely a conceptual aid and not pointers as typically con-
ceived in computer science. This works to our advantage in
designing a recursive compositional rule. . . . . . . . . . . 30
4
5
3.2 Locations at scale i are merged together to obtain the merged
sequence µ(i+ 1, p). This vector is bound with a key κ(i+
1, p) to obtain the location at scale i+1, λ(i+ 1, p) . . . . . 31
3.3 The Location memory stores all the state vectors and their
associated keys. It stores as many state vectors as size(V)
and as many keys as size(I) . . . . . . . . . . . . . . . . . 33
3.4 The different nodes along one branch of the hierarchy, from
top to bottom, are packed into a sequence and stored in the
packed hierarchy memory. The shaded path pertains to
the vector: κ(1− 5) +κ(1− 5)⊗κ(2− 2) +κ(1− 5)⊗κ(2−
2) ⊗ κ(3 − 1) The number of such vectors is given by the
size of set N, the set of all parents of the leaf nodes . . . . 34
3.5 The Supplementary memory stores superposed expressions
with the keys and the merged vector . . . . . . . . . . . . 35
3.6 Example tree depicting nodes (discrete locations) at differ-
ent scales. Pathfinding is hierarchical across the different
scales. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.7 Flow chart depicting the various stages of the pathfinding
process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.8 Flow chart depicting the various steps involved in checking
for the goal . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.9 Flow chart depicting the various steps involved in retrieving
the parent node . . . . . . . . . . . . . . . . . . . . . . . . 39
6
3.10 Flow chart depicting the various stages in retrieving the
path once the goal has been found. The memory is queried
recursively for the children nodes associated with the node
at which the goal was found . . . . . . . . . . . . . . . . . 41
4.1 Circular Convolution Represented as a compressed outer
product for n= 3 [Plate, 1995] . . . . . . . . . . . . . . . . 48
4.2 Flow chart depicting the process of composing state vectors
from nodes at smaller scales . . . . . . . . . . . . . . . . . 50
5.1 2D plot showing the similarity between the packed hierarchy
expression for a location and the sequence retrieved when
scale 3 is probed with that location . . . . . . . . . . . . . 53
5.2 Relationship of Accuracy and Confidence values with the
number of locations . . . . . . . . . . . . . . . . . . . . . . 54
5.3 Extremely sparse, hard-coded vectors . . . . . . . . . . . . 58
5.4 a) Vectors with 30 percent sparsity b) Doubly randomized
vectors with sparsity between 70− 90% . . . . . . . . . . . 60
5.5 Hopfield Net: The output of the network is recursively fed
back as input till the network stabilizes . . . . . . . . . . . 62
7
5.6 Hopfield Net: The left hand and right hand figures depict
the training samples and the output respectively. The red
and blue regions signify an activation value of 1 and 0 re-
spectively for that unit. The network was trained on 50
orthogonal patterns each represented along one column of
the two dimensional plot. It successfully retrieved the stored
patterns. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
6.1 Virtual 3D environment explored by the participants. . . . 66
6.2 The inside of a building within the virtual world. Land-
marks such as the red star and the blue circle facilitate the
creation of a spatial map . . . . . . . . . . . . . . . . . . . 67
6.3 From left to right: Map of Building 1, 2 and the neigh-
borhood. The first two are locations at scale 0 while the
neighborhood is at scale 1 . . . . . . . . . . . . . . . . . . 68
6.4 Grand Average ERP Waveforms: These waveforms were
created by averaging together the averaged waveforms of
the individual subjects across the different trials. These
recordings are from the cortical areas above the occipital
lobe, the visual processing region. . . . . . . . . . . . . . . 72
6.5 The Paired Samples T test is used to compare the conditions
pair-wise. Significant differences were found for Pair 1 and
Pair 7 (p < 0.05). Refer to table 6.1 for the experimental
conditions corresponding to the condition numbers. . . . . 73
8
B.1 Spatial hierarchy depicted in graph-form. In this example,
there are 18 locations at the most refined scale, which are
grouped into 6 locations at scale 1, 2 locations at scale 2
and 1 location at scale 3 . . . . . . . . . . . . . . . . . . . 79
B.2 Spatial hierarchy depicted in graph-form. In this example,
there are 18 locations at the most refined scale, which are
grouped into 6 locations at scale 1, 2 locations at scale 2
and 1 location at scale 3 . . . . . . . . . . . . . . . . . . . 86
B.3 Relationship of Accuracy and Confidence values to Dimen-
sionality . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
List of Tables
3.1 Table summarizing the description, domain and range for
various functions . . . . . . . . . . . . . . . . . . . . . . . 32
6.1 The condition numbers allotted to the 6 experimental condi-
tions. Each of the three maps could be identified as “most
relevant” or “not relevant” by the participant. However,
their response is not taken into account in the analysis pre-
sented here. . . . . . . . . . . . . . . . . . . . . . . . . . . 69
6.2 The three maps and the two options (relevant or not rele-
vant) create a total of 6 conditions . . . . . . . . . . . . . 69
9
Chapter 1
Introduction
Locomotion is one of the essential features that distinguish animals from
plants. Birds and fish travel several thousand kilometers, undertaking
the journeys to their breeding or feeding grounds with incredible accuracy
and economy. Rodents in the wilderness regularly find their way around
hundreds and thousands of locations in their natural habitat. To orient
themselves and search through the space, animals may use multiple, multi-
modal cues such as magnetic, visual (e.g. stars, sun, landmarks), olfactory,
auditory, electrical or tactile cues [Dolins and Mitchell, 2010]. These con-
stitute the “compass” of the organism. Research into the sensory process-
ing mechanisms has produced daunting amounts of anatomical data. The
capacity to remember and learn allows spatial knowledge to be condensed
into an internal representation. It has been more than sixty years since Ed-
ward Tolman provided experimental evidence to demonstrate the existence
of a cognitive map [Tolman, 1948]. The compass and the map, both play
a crucial role in spatial navigation. While the ‘compass is relatively well
10
11
understood, the spatial map and its properties remain elusive. We don’t
even have a good framework for understanding how spatial information
is organized and used in pathfinding. Fundamental questions regarding
the cognitive map still remain unanswered. How are locations represented
in the brain? What are the functional and computational properties of
the map? How are the locations acquired, composed, stored, recalled and
decoded? While our scientific techniques have advanced tremendously, we
have a long a way to go before we can uncover the neural correlates of the
spatial map.
1.1 Discussion of Spatial Knowledge
Models of spatial knowledge must address two fundamental concepts:
• Spatial Representation which pertains to how and what kind of
information is stored in the spatial map.
• Spatial Reasoning which deals with the process or algorithm used
to peruse the map and search for paths between specific locations.
Even before the existence of a cognitive map was proven, Henri Berg-
son, an influential French philosopher who specialized in intuition, held
the belief that “whereas space is continuous, our model of it is created by
artificially isolating, abstracting, and creating fixed states of consciousness
and integrating them into a simultaneity.”[Reese and Lipsitt, 1975]
Perhaps when Bergson talked about “integrating them into a simul-
taneity”, he was referring to a hierarchical framework. Consider this: our
12
mental conception of a city is based on all the locations we have visited.
Somehow the lower level spatial entities such as buildings, parks, roads
etc. are amalgamated to give rise to the notion of a city. As we ascend
the hierarchy, higher-level, more abstract concepts are created.
We will be working under the premise that consciousness arises from
the activity of neurons. I take the liberty of interpreting his notion of
”fixed states of consciousness”; he was surely referring to distributed rep-
resentations. If the representation of spatial locations had a one-to-one
mapping with neurons, the spatial map would be extremely vulnerable
to damage. In computational neuroscience, concepts are represented by
a network of neurons across their pattern of firing, known as distributed
representations.
1.2 Related Work
Space has been scrutinized through a multitude of lenses and for varying
purposes. Computational models for representing spatial knowledge have
been developed for various purposes such as mobile robot technology, ge-
ographical computation and Vehicle Navigation Systems. Computational
vision systems may combine spatial and temporal data to analyze the dy-
namics of visual scenes ([Cohn et al., 2002], [Rohrbein et al., 2003]). In bi-
ological models of spatial cognition, spatial quantities may be represented
by the means of path integration - the distance and direction from a start-
ing point are kept in memory while traversing a two or three dimensional
environment [Etienne and Jeffery, 2004].
13
There have been numerous experimental studies exploring the path
finding ability of rats in linear tracks, in different types of mazes such as
the T-maze, alley mazes as well as 2D environments such as the Morris
water maze. These have demonstrated the existence of a cognitive map
[O’Keefe and Nadel, 1978]. The mechanisms of path finding and naviga-
tion is a classic problem that is still being actively pursued. Kubie et
al. propose a model based on vector arithmetic [Kubie and Fenton, 2008];
however, it remains to be seen whether vector addition is carried out in the
neuronal populations. Hopfield’s model based on hippocampal place cells
is not free from sequential search; at a decision point, the rat randomly
chooses a path [Hopfield, 2010]. This leads to rapid decrease in perfor-
mance with increase in the size of the environment. Hierarchical search
provides an escape from the sloth and high costs of sequential search.
Space, while continuous, can be readily molded into a hierarchical struc-
ture. Many studies have discovered the existence of hierarchies in the
saliency of spatio-temporal information in human and non human pri-
mates [Dolins and Mitchell, 2010]. The Hierarchical Path View Model of
Pathfinding for Intelligent Transportation Systems creates a hierarchical
description of the geographical region based on road type classification
[Huang et al., 1997]. The model presented here builds upon these ideas
and proposes a novel hierarchical model of spatial reasoning. The spatial
representation scheme used is that of distributed representations which
have been previously employed in computational modeling of the seman-
tics of language (in extracting analogical similarities, etc.) [Plate, 2000].
These are examined in greater detail in section 2.2.1.
14
Many Event Related Potential (ERP) studies have explored spatial
navigation and the role of spatial attention in visual processing. Matthew
Mollison examined the elicited electrical potentials during recognition of
landmarks near target vs. non-target locations in a 3D virtual spatial
navigation task [Mollison, 2005]. The event related potentials showed a
significance difference in the P300 component in the parieto-occipital re-
gions. This experiment shows that “target” vs “non-target” landmarks
can attain conceptual significance in the spatial map.
Other experiments have demonstrated that it is possible to prime an
individual towards certain spatial locations through sustained attention.
Awh et al. showed that when specific locations were held in the working
memory, attention was directed towards these locations [Awh et al., 1998].
Visual processing of stimuli appearing at that location is enhanced com-
pared to that of stimuli appearing at other locations. Experiments in-
volving non-spatial tasks did not produce any such attentional orienta-
tion supporting the association between spatial working memory and se-
lective attention. Another study demonstrated that if the shifts of at-
tention to memorized locations is interrupted, memory accuracy declines
[Awh and Jonides, 2001]. Further, they found voltage fluctuations during
the earliest stages of visual processing. This suggests that expectancy re-
garding the location on the screen affects its perception at a very early
stage in visual processing. These studies support the idea that the brain
uses memories to form predictions about what it expects to experience
before [one] experiences it. [Hawkins and Blakeslee, 2004].
15
1.3 Contributions
Holographic Reduced Representations (HRRs) provide a way of encod-
ing a complex structures such as a hierarchy within the framework of
distributed representations. I present two models of hierarchical spatial
reasoning based on different types of HRRs: Plate’s real-valued HRR and
Kanerva’s Binary HRR [Plate, 1991, Kanerva, 1997]. Various operations
in the HRR framework result in noisy vectors which are cleaned up in an
autoassociative memory; during the path-finding process, locations are re-
trieved using autoassociative recall. To test the hypothesis of hierarchical
spatial reasoning, I designed an ERP experiment that introduces atten-
tional bias towards a particular level in the hierarchy through the use of a
pathfinding task.
1.4 Structure of Thesis
The theoretical groundwork introduced in this chapter will be examined
in greater detail in chapter 2. It presents the foundational ideas in connec-
tionism and neuroscience that form the basis of the pathfinding framework
and process presented in chapter 3. The model presented in this chapter
uses binary representations. An extension to the continuous domain using
Plate’s convolutional algebra is presented in chapter 4. Chapter 5 presents
the results obtained from running different tests on the framework and
process. Chapter 6 describes an ERP experiment designed to test the
paradigm of hierarchical spatial reasoning at a high level.
Chapter 2
Background & Preliminaries
The model presented in this thesis is inspired by artificial intelligence and
neuroscience. This chapter presents some foundational ideas on knowledge
representation, hierarchical reasoning in the brain and connectionism. Sec-
tion 2.2 provides the preliminaries that are crucial to understanding the
models presented in subsequent chapters.
2.1 Background
The spatial map consolidates and depicts spatial knowledge, including the
paths, the relationship between different entities, etc. The problem of
spatial representation and spatial reasoning has been addressed by com-
puter science and neuroscience in various ways. This section presents a
general overview of some of these approaches with special emphasis on a
hierarchical framework.
16
17
2.1.1 Neuroscience background
The divide and conquer strategy used in hierarchical algorithms is not con-
fined to computer science. There is evidence that humans abstract space
into multiple levels [Car and Frank, 1994]. In fact, a diverse range of neu-
ral modalities such as visual perception, language and cognition operate
within a hierarchical layout. This topology is found in the organization of
the cortex - a sheet of neural tissue that forms the outermost layer of the
mammalian brain. This region developed later in the evolutionary pro-
cess than sensory processing regions. It plays an important role in many
higher-level functions such as attention, language and cognition. Converg-
ing evidence from many sources indicate that the topography of the cortex
is extensively hierarchical; the majority of the human cortex contains asso-
ciation areas where converging streams of information from many different
systems are integrated [Felleman and Van Essen, 1991, Essen and Maunsell, 1983].
For instance, lower-level information about the colors, shapes, textures etc.
is combined in the cortex such that one cell may fire specifically in response
to faces. The spatial and temporal patterns are integrated to form more
abstract, higher level, stable representations as we ascend the levels of the
cortical hierarchy (see figure 2.1.1).
Hierarchical Temporal Memory(HTM) is a promising model of infor-
mation processing in the neocortex within a mathematical and algorith-
mic framework. “Information flows up and down the sensory hierarchies
to form predictions and create a unified sensory experience. This new
hierarchical description [HTM] helps us understand the process of creat-
18
Figure 2.1: Cortical hierarchy depicting the formation of invariant repre-sentations in hearing, vision and touch. Reprinted from “On Intelligence”[Hawkins and Blakeslee, 2004]
ing invariant representations.” [Hawkins and Blakeslee, 2004]. Inspired by
this model of the cortex, this project proposes an analogical framework for
representing spatial locations at different scales. Higher level representa-
tions are perceptually conceived from familiar locations at smaller level.
This model proposes that pathfinding involves traversing the hierarchy of
spatial knowledge.
2.1.2 Computer Science background
To simplify our perception of space, we intuitively impose a hierarchi-
cal structure onto the external environment. Goals are searched for by
traversing the hierarchy. This allows us to employ a “divide and conquer”
strategy to search for specific locations; one can search along a path from
19
top to bottom, excluding large areas of the problem space. Such models
can improve upon other strategies such as precomputing shortest paths or
exhaustive search, which tend to have high storage costs and worse case
complexity.
As mentioned in section 1.2, increasing computational cost with in-
crease in environment size is one of the major drawbacks of models based
on sequential or random search. Hierarchical search helps overcome this
shortcoming. Such models are traditionally represented in localist frame-
work and the number of localist elements required to represent a com-
plex recursive structure becomes prohibitive because of exponential growth
[Rachkovskij, 2001]. Moreover, it involves meaningless pointer following to
retrieve the constituents at the lowest hierarchical level. This increase in
the cognitive load of the spatial representation becomes highly problematic
with increase in the size of the environment.
2.1.3 Connectionism
The goal of this project is to understand the neural basis of spatial cog-
nition in the brain. We approach this issue from the perspective of con-
nectionism - the study of mental phenomena as emergent processes of a
network of interconnected units. It derives inspiration from many fields
such as artificial intelligence, neuroscience, cognitive science, philosophy
of mind, etc. Connectionist models of information processing in the brain
are based on a few simple, fundamental principles [McLeod et al., 1998]
1. The basic computational operation in the brain involves one neuron
20
passing information related to the sum of the signals reaching it to
other neurons.
2. Learning changes the strength of the connections between neurons
and thus the influence one has on another.
3. Cognitive processes involve the basic computation being performed
in parallel by a large number of neurons.
4. Information, whether about an incoming signal or representing the
network’s memory of past events, is distributed across many neurons
and many connections.
Connectionist models have many useful features such as graceful degra-
dation, automatic generalization, fault tolerance, etc.
Parallel, Distributed processing in Cognitive Models
Traditional models of cognitive processing as well as those of pathfind-
ing assume that information is represented in a localist fashion. In other
words, a concept in these models is stored at individual, independent loca-
tions. For instance, to implement a breadth first search, each node in the
graph may represent a particular location in the environment with edges
signifying the distance between these locations. On the other hand, repre-
sentation is distributed in connectionist models. This may seem counter-
intuitive to our notion of data storage, even storage in a general sense.
Consider this: on an average, thousands of neurons die everyday causing
random loss of information. Moreover, there is a certain amount of noise
21
and stochasticity associated with neural firing. The brain has found a so-
lution to unpredictability by performing parallel computations on concepts
that are distributed over many neurons.
Distributed Representations
As previously mentioned, distributed representations are high-dimensional
vectors that are used to represent concepts. They have many advantages
such as:
• Automatic Generalization: Locations that are similar will have
similar patterns of activation; they do not need to be explicitly repre-
sented as similar; similar activation patterns produce similar results.
This is biologically realistic since signals received in the brain which
are hardly ever identical, are nevertheless identified as specific ob-
jects, places, people, scenes, etc [Kanerva, 1993].
• Representational efficiency: Distributed representations can pro-
vide a more efficient code than localist representations. A localist
representation using n neurons can represent just n different entities.
A distributed representation using n binary neurons can represent up
to 2n different entities (using all possible patterns of zeros and ones)
[Plate, 2002].
• Soft Capacity Limits and Graceful Degradation: distributed
representations typically have soft limits on how many concepts can
be represented simultaneously before interference becomes a serious
22
problem. The accuracy of goal retrieval degrades gradually with in-
crease in the size of environment or damage to the neural structures.
In the 90s, connectionism faced a major roadblock: it was missing a tech-
nique for representing more sophisticated structures such as trees within
a distributed framework. Hinton introduced the notion of a reduced de-
scription as a way of encoding complex, conceptual structure in a distributed
framework [Hinton, 1990]. Holographic Reduced Representations (HRR)
capture the spirit of these reduced representations[Plate, 2003]. The ex-
traordinary feature of HRRs is that they allow the representation of a
concept across the same number of units as each of the constituents, thus
avoiding the problem of expanding dimensionality. HRRs provide a way
to combine two vectors to make a memory trace that is represented across
the same number of bits. Even more remarkably, the memory trace can be
used along with one of the original vectors to retrieve the other. (Hence
the term holographic which refers to the technique of storing and recon-
structing the light scattered from an object after the object is no longer
present). With its many attractive qualities, HRRs provide an ideal tool
for constructing a model for hierarchical spatial reasoning in a two dimen-
sional environment. This work presents a novel application of HRR in
hierarchical pathfinding.
2.2 Preliminaries
In the last section, we learnt that a connectionist model represents con-
cepts in a distributed fashion. In subsection 2.2.1, these representations
23
and their properties are examined in greater detail. This will be rele-
vant to understanding the Pathfinding Framework and Process presented
in Chapter 3. Subsection 2.2.2 introduces the concept of an autoassocia-
tive memory that supports parallel computation in a biologically realistic
fashion.
2.2.1 Distributed Representations
Distributed representations use high-dimensional vector spaces to repre-
sent the elements of the model, i.e. locations in the environment. Dis-
tributed representations have two main properties:
• Each concept (e.g., an entity, token, or value) is represented by more
than more neuron (i.e., by a pattern of neural activity in which more
than one neuron is active.)
• Each neuron participates in the representation of more than one
concept.
In this model, the neural codes or patterns of activity corresponding to
locations in the environment are represented as fixed, high-dimensional,
binary vectors. Similarity of two neural patterns can be computed in dif-
ferent ways depending on the framework. When using real-valued vectors,
similarity is the dot product followed by thresholding or normalization. For
binary vectors, similarity is defined as the number of corresponding, iden-
tical bits in the two vectors. Two operations are used to form associations
within this framework: binding and merge operation, both of which retain
the dimensionality of the vectors on which they operate.
24
Binding Operation
The binding operation on two vectors combines them into a third vector
which is not significantly similar to either one of its constituents. The
resulting vector known as trace can be used with one of the constituting
vectors to retrieve the other vector. Binding is a flexible operator with
many useful properties:
• Commutativity: A©B = B© A
• Associativity: (A©B)© C = A© (B© C)
• xor has the unique property of self-inverse: A© B © B = A and
B©B© A = A
The binding operator for binary representations is the exclusive-or (xor)
operator (denoted by ⊗) - a logical operator that results in true iff exactly
one of the operands is true. In this case the decoding is perfect; the re-
trieved vector is identical to the original. For real-valued distributed rep-
resentations such as the ones in Plate’s convolutional algebra, the binding
operator is the convolution operator. [Plate, 1995]
Merge Operation ‘+’
This is an important feature of distributed representations; the superpo-
sition or merge operation between two or more vectors generates a vector
that is similar to each of its constituents. There is no “unmerge” op-
eration that allows decoding from superposition. Since the vectors are
independently and uniformly distributed, high-dimensional vectors, there
25
is a minuscule probability for two random vectors to have higher than sig-
nificant similarity by pure chance. The binding operation is distributive
across merge: Distributivity: A© (B + C) = (A©B) + (A© C)
The merge operation for binary vectors is known as thresholding - each
bit of the result vector is 0 or 1 depending on which appears in that position
most often among all the constituting vectors; ties are broken at random
with probability 0.5. In this thesis, the merge operation is denoted by ‘+’,
not to be confused by the addition operator.
2.2.2 Autoassociative Memory
The intuition behind an autoassociative memory is the simple idea that
“the distances between concepts in our mind correspond to distances be-
tween points in a high-dimensional space [Kanerva, 1993]. An autoasso-
ciative memory stores different patterns of activation across the same set
of connections. It is able to perform pattern-completion and thus is an
ideal candidate for a computational model of recall. The auto-associative
memory takes a noisy vector as input and outputs a specified number of
vectors with the highest similarity scores (or indicates that the input is
not significantly similar to any of the stored vectors). It is used to clean
up the noisy vectors that result from various encoding and decoding op-
erations. This is similar to a standard item memory which retrieves the
best-matching vector when cued with a noisy vector, or retrieves noth-
ing if the best match is no better than what results from random chance
[Kanerva, 1998]. In our implementation the auto-associative memory can
26
return more than one item if each gives a high similarity and the similarity
values are very close to each other (within 1%). These memories give the
system the capacity for recall along with recognition.
2.2.3 Hierarchical Spatial Reasoning
As mentioned in section 2.1.2, this model imposes a hierarchical framework
on the environment. Let the natural habitat contain salient locations sepa-
rated by empty stretches that are not explicitly represented in the cognitive
map. Information about the discrete, salient locations is acquired from the
environment through extended exploration and is consolidated within a hi-
erarchical framework. This section introduces terminology pertaining to
this tree data structure that will support the pathfinding process. Leaf
nodes or leaves are the locations in the environment at the smallest scale.
All other nodes in the tree, referred to as internal nodes (I), are composed
of nodes at smaller scales (See figure 2.2). For instance, if individual rooms
inside a building were to be leaf nodes, the building, the neighborhood,
and other locations at higher scales would be represented as internal nodes.
Every leaf node has a parent, i.e., the node it belongs to at the next
higher scale. The depth of a tree is the length of a path from the root to the
node. The root node at the topmost level has depth 0. In our model, we
will assume that all the leaves are at the same depth. Every internal node
has at least one child. The height of the tree is the depth of a leaf node.
The scale of a given node is defined as the height − depth. This gives us
a numbering system where locations at more refined levels have a smaller
27
Figure 2.2: Hierarchical Representation of the Environment. The topmostnode is the root node and the three nodes at the lowest scale are the leafnodes. Values for depth and scale are indicated on the left. Height of thetree is the depth of a leaf node
scale. Search is sequential across the different levels of the hierarchy. The
next chapter describes the hierarchical framework in greater detail along
with the process of search through it.
Chapter 3
Pathfinding Framework and
Process
“All truly great thoughts are conceived while walking” - Nietzsche
Chapter 2 introduced Holographic Reduced Representations as a tech-
nique of encoding complex, compositional structure. In this chapter, these
are used as building blocks to create a hierarchical representation of the
environment i.e., the framework. The framework consists of an autoas-
sociative memory that stores different compositional vectors. They con-
ceptually represent the associations formed between various locations in
the environment. During the pathfinding process, locations are retrieved
using autoassociative recall. Section 3.1 describes the framework which
involves the recursive generation of state vectors for locations at higher
scales. Section 3.2 describes the process of searching for a path from a
start location to a goal location within this hierarchical framework. The
28
29
autoassociative memories that were populated in Section 3.1 are queried
at many stages of the pathfinding process.
3.1 Pathfinding Framework
This model works under the assumption of a fully-explored environment.
In other words, the leaf nodes of the tree described in subsection 2.2.3 are
given as input to the model. These distinct, discrete locations are rep-
resented using randomly generated, high-dimensional, binary distributed
state vectors (the extension to the continuous domain is presented in Chap-
ter 4.) State vectors corresponding to nodes at higher scales of the spatial
hierarchy are recursively generated using the nodes at lower scales. These
state vectors are stored in the autoassociative memory, in some cases after
further composition. This section describes the ”pre-processing” steps or
the compositional rules of the process required to generate the hierarchy
(shown in figure 3.1).
Let us introduce set notation to formalize the distributed framework
associated with the spatial hierarchy. Let T=(V,E) denote the tree repre-
sentation of the spatial environment. V is the set of all nodes in the tree
and E (edges) conceptually encode the hierarchical ordering; they are not
explicitly represented but are useful in hierarchical composition. Elements
of the set V are one of two types: leaf nodes and internal nodes (previously
introduced). If L is the set of leaf nodes and I is the set of internal nodes,
the equivalent description in set notation is given by: T(V,E) = L ∪ I.
All nodes are represented as state vectors with d individual units (the di-
30
Figure 3.1: A sample hierarchical framework depicted in graphical form.Each location at higher scale is composed of the constituting locationsat lower scales. Indexes are indicated next to the scale number. Note:The edges do not encode any feature of the spatial environment such asdistances etc; they are merely a conceptual aid and not pointers as typicallyconceived in computer science. This works to our advantage in designinga recursive compositional rule.
mensionality d is usually between 1000-10,000). Each unit can be active
(’1’) or inhibited (’0’). The leaf nodes referred to as ψ are generated ran-
domly such that the probability of activation of each unit is 0.5.
ψ(psi) : L→ {0, 1}d
λ(lambda) : V → {0, 1}d
Section 3.1.1 presents the compositional rules for the recursive generation
of locations.
3.1.1 Hierarchical Composition of locations
Higher-scale locations are recursively composed of those at smaller scales.
This is based on the intuition that the state vectors for locations at higher
scales emerge from exploration of sub-regions and are systematically re-
31
Figure 3.2: Locations at scale i are merged together to obtain the mergedsequence µ(i+ 1, p). This vector is bound with a key κ(i+ 1, p) to obtainthe location at scale i+1, λ(i+ 1, p)
lated to them. As a reminder, the binding operation denoted by ⊗ is the
exclusive-or operation. It doubles up as the decoding operator. Similarity
between two binary vectors is computed as 1− Hd
where H is the Hamming
Distance and d is the dimensionality.
Every internal node v ∈ I is associated with a key vector and a merged
vector denoted by κ(n− i) and µ(n− i) respectively, where n is the scale
and i is the index of that location. A location at scale n (where n>0) is
constructed from a location at scale n-1 in two steps:
1. The constituting scale n− 1 vectors are merged or superimposed to
obtain the merged vector . The merged expressions, defined for all
internal nodes, are obtained as follows:
∀v ∈ I, µ(v) =∑
u∈children(v)
u
2. The vector for scale n is then computed by binding the merged vector
with a unique key κ(v) for that location: λ(n−i) = κ(n−i)⊗µ(n−i).
The spatial locations are defined as:
32
Function Domain and Range Intuitive Associationψ (psi) L→ {0, 1}d Leaf Nodesµ (mu) I → {0, 1}d Merged Vectorκ (kappa) I → {0, 1}d Keysλ (lambda) V → {0, 1}d Spatial Locations (State Vectors)π (pi) L→ {0, 1}d Packed Hierarchy
Table 3.1: Table summarizing the description, domain and range for vari-ous functions
∀v ∈ V, λ(v) =
ψ(v) if v ∈ L
κ(v)⊗ µ(v) if v ∈ I
Holographic Reduced Representations allow us to represent locations
from the smallest to the largest scale using the same number of units. The
nodes are systematically related to each other such that information about
their constituents can be gathered without expansion of the nodes into
their children (see section 3.2.1). Hence, not only is the dimensionality
preserved, the framework design proposed here ensures that the nodes
are content addressable. This provides an ideal setup for the hierarchical
pathfinding process presented in section 3.2.
3.1.2 Auto-associative Memory
In this path finding model, there are two main auto-associative memories
and a supplementary memory.
1. The Location memory stores all the locations at different scales such
as λ(0−1), λ(0−2), λ(1−1), λ(2−1), . . . , the keys κ(0−1), κ(1−2),
. . . This memory is queried to retrieve the location corresponding to
33
Figure 3.3: The Location memory stores all the state vectors and theirassociated keys. It stores as many state vectors as size(V) and as manykeys as size(I)
the next scale, when the goal is not found at the current scale.
2. The packed hierarchy memory consists of associations between the
location keys across all the nodes from the top to the bottom of the
hierarchy (see figure 1). Before we arrive at a description of these
vectors, let us recursively define set Anc(v,i) as the first i ancestors
of node v:
Anc(v, i) =
parent(v) if i = 1
parent(v) ∪ Anc(parent(v), i− 1) if i ≥ 1
Let N be the set of all the parents of the leaf nodes. The vectors
that populate the packed hierarchy memory are computed as:
∀v ∈ N, π(v) =
height(T )∑i=1
v ⊗∏
κ(a),∀a ∈ Anc(v, i)
This memory is queried to retrieve the location at the next higher
scale as well as to check whether the goal is present at a given loca-
tion.
3. The supplementary memory stores expressions obtained from su-
34
Figure 3.4: The different nodes along one branch of the hierarchy, from topto bottom, are packed into a sequence and stored in the packed hierarchymemory. The shaded path pertains to the vector: κ(1 − 5) + κ(1 − 5) ⊗κ(2 − 2) + κ(1 − 5) ⊗ κ(2 − 2) ⊗ κ(3 − 1) The number of such vectors isgiven by the size of set N, the set of all parents of the leaf nodes
perposing the key with the corresponding merged vector such as
κ(0−1) +µ(0−1), etc. This memory is queried to retrieve the state
vectors corresponding to the a set of keys. There are size(I) vectors
of this type in the memory.
Note: It is assumed that locations within a grouping are interconnected
i.e., there exists a path connecting every pair of nodes that share a parent.
3.2 Processes
Now that we have a representation of the spatial hierarchy, let us see how
it is useful in finding the path from one location to another. Pathfinding
35
Figure 3.5: The Supplementary memory stores superposed expressionswith the keys and the merged vector
happens sequentially from top to bottom, across the nodes at different
scales of the hierarchy. There are various sub-operations involved as shown
in the figure below. As we ascend the scales, the nodes preserve their
dimensionality; however they become conceptually heavier since more state
vectors are being packed. Thus, there is a tradeoff between efficiency of
hierarchical search and increase in the noise of the system.
To find a path from the start location to the goal location in this
environment, locations at higher scales are searched until a location is
found that contains both the start and goal locations. Once this location
has been found, the corresponding locations at smaller scales are retrieved
so as to find the path leading to the goal.
In the illustration above (Figure 3.6), let us look for the path from a
start location in λ(1−1) to a goal location in λ(1−5). One searches across
the hierarchy of scales upto λ(3− 1) and is able to retrieve the path as a
subset of the nodes in the tree:
(start) → λ(2− 1)→ λ(3− 1)→ λ(2− 2)→ λ(1− 5)→ goal.
36
Figure 3.6: Example tree depicting nodes (discrete locations) at differentscales. Pathfinding is hierarchical across the different scales.
This process and the associated sub-processes are summarized in the pseudo
code below:
function(current, goal)
while start! = goal (Use the goal location to test this)
Retrieve location at next scale
current = next
Use Location memory to check whether start is the goal
end while
if current==goal
Generate a query using the current goal
for i= 2 to scale-1
Probe location memory with the query to retrieve
scale i node using scale i-1 node
Generate new query by combining scale-i node with
previous query
37
end for
end if
else: goal not found
This process is summarized in the high-level description of the search
process shown in figure 3.7. Each of the three operations are explained
in subsequent sections: checking for the goal (3.2.1), moving up a scale
(3.2.2) and retrieving all the ancestors of the goal node (3.2.3).
Figure 3.7: Flow chart depicting the various stages of the pathfindingprocess
3.2.1 Checking for the Goal
Searching for the goal g at any given location X is accomplished by query-
ing the packed hierarchy memory with g ⊗ X. This is also called probing
X with g [Kanerva, 1997]. This expression will retrieve a vector with a
38
Figure 3.8: Flow chart depicting the various steps involved in checking forthe goal
high similarity value in the packed hierarchy memory iff X contains the
goal. This feature has been designed into the framework to ensure that
the nodes are content-addressable. Else, hierarchical search would not be
possible. A more comprehensive look at this property reveals two aspects
that make it possible:
• Algorithm Design: The location expressions are composed of merged
vectors and their associated location keys. The probe operation “un-
locks” a node only if it contains the goal node.
• Self-inverse property of xor: At a more fundamental level, the
working of the probe operation relies on the fact that xor can be
used as the binding and unbinding operation.
39
3.2.2 Retrieving the next Scale
Figure 3.9: Flow chart depicting the various steps involved in retrievingthe parent node
The current location at scale m when combined with all the previously
encountered locations at smaller scales can be used to retrieve the location
at the next higher scale. This is described in a pseudo code form below
followed by a more detailed description:
• The packed hierarchy memory is queried with current location at the
smallest scale (scale 0) which will return a high similarity with an
expression of the form:
40
StoSx = λ(0−x)+λ(0−x)⊗λ(1−y)+λ(0−x)⊗λ(1−y)⊗λ(2−z)+. . .
Note: This first step is performed only once - when the current
location is at scale 0.
• Let Y be the expression obtained from binding all the locations at
the different scales from the current location at scale 0 to scale m.
To retrieve the location at scale m+1 the location memory is queried
with: Y ⊗ StoSx.
These steps are summarized in the flow chart 3.9.
Note: For m>0, a disambiguation step is required since locations at
scale m+1 and scale m are returned. The returned location is either the
location at the next scale (desired) or the location at the current scale;
therefore comparison with the current scale location is sufficient to obtain
the location at the next scale.
3.2.3 Retrieving the Path
Once the goal has been found, all the ancestors to the goal node can be
retrieved by recursive querying of the memory. Let goalExp be the packed
hierarchy expression returned by the memory. The path is retrieved in the
following steps:
• The Location memory is queried with the packed hierarchy vector P.
Let the retrieved vector be X. This would be the parent of the goal.
• Generate a query X ⊗ P . Use this to retrieve the parent of X from
the location memory; this is the new X.
41
Figure 3.10: Flow chart depicting the various stages in retrieving the pathonce the goal has been found. The memory is queried recursively for thechildren nodes associated with the node at which the goal was found
42
• Perform the previous step till all the ancestors for the goal have been
retrieved.
• The final step in retrieving the path involves finding the merged
vectors corresponding to the keys retrieved in the previous step. The
supplementary memory is queried with the key κ to retrieve vector
PH.
• The location memory is queried with κ ⊗ PH to retrieve the corre-
sponding merged vector µ.
In the illustration above, the goal is found at scale 3, the smallest
scale containing both the goal and the start location. Let the expression
returned from the memory be called PH17. Using the goal key κ(0 − 17)
and PH17, we can successively retrieve the locations keys at the different
scales along the path to the goal. This is an expansion operation with
scale keys.
κ(0− 17) = ⊗PH17 = κ(0− 17)⊗ (κ(0− 17)⊗ κ(1− 5) + κ(0− 17)⊗
κ(1− 5)⊗ κ(2− 2) + κ(0− 17)⊗ κ(1− 5)⊗ κ(2− 2)⊗ κ(3− 1))
= (κ(1− 5) + κ(1− 5)⊗ κ(2− 2) + κ(1− 5)⊗ κ(2− 2)⊗ κ(3− 1))
∼ κ(1− 5)
Similarly κ(0− 17)⊗ κ(1− 5) can be used to retrieve the location at the
next higher scale λ(2− 2). If the goal was found at scale n, the expansion
operation involves n− 2 such steps are required to find the different scales
at which the goal is present and retrieve the path to the goal.
43
To retrieve the merged vectors for the keys, the location memory is
queried using the keys κ(1 − 5) and κ(2 − 2). This returns κ(1 − 5) +
κ(1− 5)⊗ µ(1− 5) and κ(2− 2) + κ(2− 1)⊗ µ(2− 2) respectively. These
expressions are bound to their respective keys and are used to query the
Location memory recursively returning: (0 + µ(1 − 5)) ∼ µ(1 − 5) and
(0+µ(2−2)) ∼ µ(2−2) Through this recursive sub-process, one retrieves
the representations at different scales for the path towards the goal:
µ(3− 1)→ µ(2− 1)→ µ(1− 5)
This path may be a novel route that was never traversed, which was re-
trieved based on the stored associations.
For a specific example of pathfinding see Appendix B.1.
3.3 Features of the Algorithm
This process exploits the property of the framework that each node in the
search tree is content-addressable to some extent. Direct accessibility to
nodes allows us to search across scales rather than sequentially across all
the locations. This avoids combinatorial explosion of costs in larger envi-
ronments. We have seen that the noise in the system increases with the
number of locations in the environment. One can overcome this drawback
and the subsequent loss in accuracy by breaking down the problem hier-
archically. Let the scale 1 node corresponding to the goal be known. The
problem is reduced to searching for the scale 1 node in the cognitive map
of the environment followed by searching for scale 0 node within the scale
1 node. Such a modification to the algorithm requires an additional set
44
of sequences to be stored in the packed hierarchy memory namely packed
hierarchy sequences with scale 1 nodes.
3.4 Analyzing the Process
In this section I present the computational cost involved in different pro-
cesses and sub-processes.
Let there be n locations in the environment each represented as a vector
of dimension d. The preprocessing storage requirements for the three types
of auto-associative memories are as follows:
1. The Location memory contains the representations for all the lo-
cations at various scales of the hierarchical spatial map. We know
that if there are n leaf nodes then the maximum number of nodes
in the tree is given by the number of nodes in a binary tree with n
leaf nodes. Hence, if there are n locations at the lowest level of the
hierarchy, then the total number of locations is:
20 + 21 + . . .+ 2logn =2logn+1 − 1
2− 1= 2(2logn)− 1 = 2n− 1
Hence, total storage space required S = (2n-1)d.
2. The secondary memory stores all the keys, values and expressions
combining the keys and values. Since all nodes in the tree are asso-
ciated with keys and values except the ones at the lowest level of the
hierarchy. Pursuing a line of analysis similar to the one above gives
45
us the total number of keys/values as:
20 + 21 + . . .+ 2logn−1 = 2logn − 1 = n− 1
Since there are three sets of expressions involving either keys, values
or both, the total space required S = 3 ∗ (n− 1) ∗ d
3. The packed hierarchy memory contains as many expressions as there
are locations, giving us a total storage requirement of (2 ∗ n− 1)d.
Now we analyze the performance of the algorithm with increase in the
size of the environment. The expression at the highest scale contains a
total of x∗ y ∗ z = n terms merged together. As we increase the number of
vectors that are packed into one sequence, the noise in the system increases
till checking for the goal no longer gives reliable results. Let K be the
number of terms in X = [A + B + ... + D(+R)]. To analyze the limit
on the size of the environment, we compute the limit for K for which the
probability that X and A are the same, α(X,A) is significantly different
from α(A,B) the probability that two random vectors A and B are the
same. α(X,A) = 1− delta(X,A) where δ(X,A) is the expected Hamming
distance between X and A, i.e. the probability that X and A differ. It is
given by
δ(X,A) =1
2− 1
2K
(K − 1K−12
)(3.1)
(For derivation of δ, see appendix A)
which is approximated as 0.5− 0.4√K − 0.44 [Kanerva, 1997]
The standard deviation of the binomial distribution for two random
46
vectors is given by σ =√
δ(1−δ)N
. For two random vectors of length N =
10, 000, δ = 0.5, σ = 0.005 is so small that δ(X,A) need not be far from
0.5 to be significantly different from it [Kanerva, 1997]. Sequences with
K > 177 are associated with an alpha value lower than 0.53 and for those
with K> 255 have an alpha value lower than 0.525. If a relatively fewer
runs are desirable, then we could assign an upper limit such as K= 200 as
the upper limit for α(X,A) to be reliably distinguishable from alpha for
two random vectors in a certain number of runs< 30.
Chapter 4
Extension to Continuous
Domain
“You have your way. I have my way. As for the right way, the correct
way, and the only way, it does not exist.” - F. Nietzsche
A second model of hierarchical spatial reasoning was developed. The
extension to continuous domain is realized using Plate’s Holographic Re-
duced Representations [Plate, 1991]. This representation scheme uses a
high-dimensional continuous space of real-valued vectors. To recap, the
similarity between two vectors is calculated as the normalized dot prod-
uct. The merge operation is carried out through bit-wise addition followed
by normalization. The Binding operator within this framework is the cir-
cular convolution operator shown in figure 4.1. This is a compressed outer
product with the unique property of preserving the dimensionality of the
resultant vector, a.k.a. the memory trace. Plate’s convolutional algebra
47
48
has the remarkable feature that it can decode the trace to retrieve one
of the original vectors that participated in the binding operation. The
involution of c, denoted by cT results in a vector d such that di = c−i,
where the subscripts are modulo-n. Involution operation functions as the
approximate inverse of convolution. Decoding is achieved by taking the
involution of one of the vectors used in encoding, and convolving it with
the memory trace.
Figure 4.1: Circular Convolution Represented as a compressed outer prod-uct for n= 3 [Plate, 1995]
4.1 Extension to Framework
As in the case of the binary framework, state vectors for higher level nodes
are recursively composed of those at smaller scales. The compositional
rules, however, are different for the locations as well as the vectors stored
in the auto-associative memory. One does not require keys or merged
vectors in this scheme. This is made possible by the design of the state
vectors using the auto-convolution operation i.e. convolution of a vector x
with itself - x~ x.
49
Once again, let T=(V,E) denote the tree representation of the spatial
environment where V is the set of all nodes in the tree and E (edges)
conceptually represents the hierarchical ordering. This set is sectioned
into a subset of leaf nodes L and internal nodes I. Each of the leaf nodes
is assigned a real-valued vector that has a mean at 0 and a variance of
1d
[Plate, 1995] (the dimensionality of the vector, d is typically assigned a
value of 2048).
4.1.1 Autoassocaitive Memories
As mentioned previously, the merge, also denoted as summation in the
equations below, is bit-wise addition followed by normalization. The cir-
cular convolution operation is also referred to as a product in this chapter;
since product and sum do not have any default meanings in the world
of distributed representations, this is reasonable. The continuous frame-
work presented here requires two types of sequences to be stored in the
auto-associative memory:
• The Location memory which contains all the locations. The internal
nodes are composed as follows:
∀v ∈ L, λ(v) =∑
u∈children(v)
v ~ v
• The packed hierarchy memory consists of associations between loca-
tions at different scales. For each leaf node, vectors in this memory
pack the complete hierarchy from top to bottom. Before defining the
compositional rule, let us define set Anc(v,i) as the first i ancestors
50
Figure 4.2: Flow chart depicting the process of composing state vectorsfrom nodes at smaller scales
of node v:
Anc(v, i) =
parent(v) if i = 1
parent(v) ∪ Anc(parent(v), i− 1) if i ≥ 1
The vectors that populate the packed hierarchy memory are com-
puted as:
∀v ∈ L, π(v) = v +
height(T )∑i=1
v ⊗∏
λ(a), ∀a ∈ Anc(v, i)
4.2 Extension to Process
From a high-level, the pathfinding process is essentially the same as in the
case of binary representations - search is hierarchical across the different
scales. However, the details of the operations have been modified for the
51
new framework presented in the previous section 4.1. The design of the
vectors is slated to ensure content-addressability of the nodes.
4.2.1 Checking for the Goal
Checking whether a particular node contains the goal requires the use
of the decoding (involution) operator introduced earlier. If the goal lo-
cation is represented by g and the location at the current scale is loc,
then loc contains g iff gT ~ loc returns an expression with high similarity
from the hierarchy packing memory. This confers the property of content-
addressability to the nodes of our tree, which in turn supports hierarchical
search.
4.2.2 Goal Retrieval
Let X be the expression returned from the auto-associative memory when
a node at scale n is probed with the goal g. Now, we can probe X with the
goal g to retrieve the goal location at scale n−1 from the packed hierarchy
memory, say f. Next we probe X with g ~ f to retrieve the goal at scale
n − 2 and so on. Thus we can recursively obtain goal locations at more
refined scales.
Chapter 5
Experiments and Results
“A casual stroll through the lunatic asylum shows that faith does not prove
anything.” -F. Nietzsche
The pathfinding process described in chapter 3 relies on being able
to directly query a node for any of its children. With increase in the
size of the environment, there is a decrease in the accuracy of identifying
whether a particular child node at a given scale is present. A proof of
correctness for the process is beyond the scope of this work. However,
the HRR framework was simulated in MATLAB. This chapter describes
results from experiments designed to test different aspects of the model.
5.1 Experiments on the Process
This section presents results on the the accuracy and confidence with which
the goal is retrieved and how it varies with the size of the environment and
52
53
the dimension of the state vectors. The example presented in appendix
B.2 consisting of locations at three scales was simulated in MATLAB. The
search task is arbitrarily chosen as finding the path from λ(0− 1) to a far
off goal location λ(0−17). To locate the goal, the process must recurse up
to scale 3, the scale at which the start and the goal nodes share a common
ancestor. Since this node contains all the locations in the environment, we
can generate a two dimensional plot. If the diagonal elements have higher
similarity than non-diagonal elements, then this provides evidence towards
successful check for the goal i.e., the content-addressability of the nodes.
Figure 5.1: 2D plot showing the similarity between the packed hierarchyexpression for a location and the sequence retrieved when scale 3 is probedwith that location
The two dimensional plot shown in figure 5.1 depicts that the packed
hierarchy representation for a leaf node has a high similarity when this
54
location is the goal location and low similarity otherwise. The sequences
in the packed hierarchy memory corresponding to every leaf node is rep-
resented along the x-axis. Hence, the x axis ranges up to the number of
leaf nodes in the environment. The y axis represents the query when the
aforementioned leaf node is assigned as the goal location. In other words,
the column y=k represents the query to the memory when k is the goal
and current location is λ(3− 1). The (i, j)th cell of the plot represents the
similarity between the packed hierarchy for that leaf node and the query
expression obtained using that leaf node as the goal. The similarity values
are color coded as depicted in the color bar on the right.
0 20 40 60 80 100 120 140
0.49
0.5
0.51
0.52
0.53
0.54
0.55
0.56
Number of locations
Sim
ilari
ty V
alu
e
Accuracy and Con!dence values for all locations
Accuracy
Con!dence
Figure 5.2: Relationship of Accuracy and Confidence values with the num-ber of locations
The accuracy value of a two-dimensional plot is defined as the average
55
of all the diagonal elements - it represents the average similarity between
the packed hierarchy sequence for a location and the sequence retrieved
when that location is assigned as the goal. The confidence value is defined
as the average of all the non-diagonal elements. It represents the average
similarity between the packed hierarchy sequence for a location and and
the sequence retrieved when any other locations are assigned as goals.
The next figure 5.2 is created using the accuracy and confidence values
computed for each two dimensional plot when the number of nodes in the
tree are varied. The size of the environment ranges from 3 to 138 individual
locations (leaf nodes) each represented as 5000-dimensional vectors. As
expected, the confidence values are near 0.5 showing that the similarity
of the query for non-corresponding nodes is as good as random. Since
the number of scales is kept constant across environments with increasing
size, greater number of locations are packed to obtain λ(3− 1). Hence we
expect the accuracy of retrieval to decrease as observed in the initial part
of the graph. However, the curve exhibits convergence when the size of
the environment is within a certain range.
These results pertain to the real-valued HRR model; similar results
were observed for the second model that uses binary representations.
5.2 Autoassociative Memory
Associative neural memories are a class of artificial neural nets that have
many similar properties to associative recall in the brain. Until now, the
autoassociative memory has been mainly identified by its functional char-
56
acteristics; little has been said about its specific architecture. The results
presented in the previous section use a list-based autoassociative memory
that computes similarities in a sequential fashion. This is not biologi-
cally realistic or very efficient from the perspective of pathfinding. The
goal of this section of my thesis of the project is to implement a parallel,
distributed version of an associative neural memory. I examined the ef-
ficiency of two neural network models in autoassociative recall of binary
representations:
• MATLAB Neural Network Toolbox
• Hopfield Neural Network
5.2.1 Neural Net
The neural network toolbar in MATLAB was used to create a parallel,
distributed autoassociative memory. Network sub-objects such as the in-
puts, layers, outputs, targets, biases, weights and their connectivity are
accessible through the interface. The neural net in my experiments has 2
hidden layers. Both these layers have biases. The input is connected to the
first layer only. The second layer has weights coming from the first layer
only. The second layer is connected to the output. The neural net followed
the Batch Training Widrow-Hoff rule. This is a method for adapting the
weights of the network by minimizing the sum of the squares of the linear
errors [Widrow and Hoff, 1960].
A matrix of vectors, also known as training samples, was given as input
to the network to create and train a pattern recognition neural net. To
57
test the performance of the network, few of the input vectors were fed
into the network after undergoing mutation - each bit was flipped with a
certain probability of mutation. These vectors are called testing samples.
The output from the network was in the form of n numbers where n is
the number of patterns on which it was trained. A high value for the kth
output indicated that the testing vector was deemed by the network to
be closest to the kth input. Let us define accuracy as the output value
corresponding to the input that was mutated to obtain the testing vector.
Confidence is computed by subtracting the sum of all output values that
do not correspond to the input used to make the testing vector. The
performance of the network with different types of input distributions were
examined and compared.
Type of Input Distribution
• Using extremely sparse, hard-coded vectors: 100 vectors of 100 bits
each such that the ith bit of the ith vector is 1 while all the other
bits are 0. This is not a very efficient way of doing this; this test
is in essence a worse-case scenario and we will use it to compare
other input distributions to it. When a vector in this distribution
is mutated even by a small amount say 1% (1 randomly chosen bit
out of a 100 is flipped), there is a good chance that it will lose its
unique identity; the mutated vector is now the same distance from
the original vector as it is from another vector in the distribution
(with 1 in the position where our vector was mutated).
59
• Using semi-sparse vectors:
100 vectors with 100 bits each were each assigned 10 bits as 1 and
the rest as zero. I explored two different approaches: the hard-coded
version and the stochastic version.
1. Hard-coded semi sparse vectors: A hard-coded sample distri-
bution is chosen to check the performance of the neural code
on an input that is well distributed over the input space given
the constraints (i.e. approximately 10% of the vector is 1. The
rows of the matrix below depict a few sample 10-bit regions of
the vector (all other bits being set to 0).
1 0 1 0 1 0 1 0 1 0
0 1 0 1 0 1 0 1 0 1
1 0 0 1 0 1 1 0 0 1
2. Stochastic semi-sparse vectors Input distributions are created
such that a certain percent of the vector is randomly assigned
an activation value of 1. To ensure that the distribution covers
the whole space of inputs, the first 20 vectors of the distribution
are designed by choosing 1/0 with equal probability for the first
20 bits and setting the remaining 80 bits to 0. The next batch
of 20 vectors are assigned 1/0 with equal probability for the bits
from 20-40, the remaining bits being set to 0 and so on.
• Using random vectors with > 50% ones:
This type of input distribution is designed to have double random-
ness. 70− 90% of the vector is randomly assigned a value of 0. Not
60
Figure 5.4: a) Vectors with 30 percent sparsity b) Doubly randomizedvectors with sparsity between 70− 90%
61
only are the bits chosen with a probability of 0.5, the percentage of
the vector that is 0/1 is also randomized (figure 5.4).
5.2.2 Hopfield Net
Memories are retained as stable entities or Gestalts and can be correctly
re-called from any reasonably sized subpart. The bridge between simple
circuits and the complex computational properties of higher nervous sys-
tems may be the spontaneous emergence of new computational capabilities
from the collective behavior of large numbers of simple processing elements
[Hopfield, 2006]. The Hopfield net is a recurrent artificial neural network
that serves as a content-addressable memory. The state of the network
has a scalar energy value associated with it. It is trained on binary repre-
sentations which occupy lower-energy states in the high-dimensional input
state space. An input to the network will gravitate towards these stored
patterns.
Implementation:
I used the MATLAB recipe for the Hopfield network that is based
on research by Li et al. They have studied a system that has the basic
structure of the Hopfield network is “easier to analyze, synthesize, and
implement than the Hopfield model.” [Li, 1989]. The MATLAB Hopfield
Net implementation uses the satlins function. For inputs less than -1
satlins produces -1. For inputs in the range -1 to +1 it simply returns the
input value. For inputs greater than +1 it produces +1. The input p to
the network specifies the initial conditions for the recurrent network. The
62
output of the network is fed back to become the input until it stabilizes
(see figure 5.5).
Figure 5.5: Hopfield Net: The output of the network is recursively fedback as input till the network stabilizes
I trained a Hopfield network on a set of target equilibrium points rep-
resented as a matrix T of vectors. The weights and biases for a recursive
Hopfield network were changed during the training phase, suggesting that
learning has happened. The network is guaranteed to have stable equi-
librium points at the target vectors, but it could contain other spurious
equilibrium points as well. In an autoassociative memory, concepts share
the same set of connections. By using inputs that are orthogonal or as
close to orthogonal as possible, we can optimally reduce the amount of
cross talk between different stored patterns [Hinton, 1989].
Figure 5.6 shows an orthogonal distribution of 50 hundred-dimensional
63
vectors used to train the network. Mutated versions of these were used
as testing samples. The network was successful in retrieving the stored
patterns.
Figure 5.6: Hopfield Net: The left hand and right hand figures depict thetraining samples and the output respectively. The red and blue regionssignify an activation value of 1 and 0 respectively for that unit. Thenetwork was trained on 50 orthogonal patterns each represented along onecolumn of the two dimensional plot. It successfully retrieved the storedpatterns.
Chapter 6
Event Related Potential
Experiment
6.1 Introduction
The firing of neurons is accompanied by change in the electrical potential
across the cell membrane. Event Related Potential (ERP) is an electro-
physiological technique used to record evoked potentials on the scalp in
response to a stimulus or directly after a cognitive response. The most
commonly used type of event ERP studies record visual evoked poten-
tials which are elicited in response to a visual presentation. ERPs have
high temporal resolution (1ms or better under optimal conditions) but
poor spatial resolution [Luck, 2005]. A comparison of the latency and
amplitude of the evoked potentials across the different experimental con-
ditional is done to find evidence for neural correlates of mental activ-
ity. Previous studies have discovered that our attention gravitates to-
64
65
wards specific spatial locations that are sustained in the working memory
[Awh et al., 1998, Awh and Jonides, 2001]. I designed and conducted an
ERP study to explore the attentional orientation towards maps at differ-
ent scales while actively thinking about a certain path. The purpose of
this study is to test the hierarchical spatial reasoning paradigm at a high
level.
6.2 Methods and Procedures
The first ingredient required to test spatial reasoning is a spatial map.
Since the subjects were mostly students at Hampshire College, we consid-
ered using the campus map. We found a more appealing alternative - a
virtual 3D world consisting of locations at different scales (shown in figure
6.1). This controls for potential bias due to varying degrees of familiarity
with the environment. This was also considered a better choice than us-
ing maps at different locations which would introduce hierarchical bias in
the initial exposure to the environment. Moreover, being allowed to roam
around in a virtual world during the somewhat long process of placing the
electrode cap kept the participants engaged and attentive.
6.2.1 Training Phase
The ERP design is based on the subjects’ ability to reason about and
visualize paths between two locations. A number of buildings are situated
in the vicinity of each other. A turquoise fountain, a gazebo and a mailbox
are placed at strategic locations to orient the player. Two of the buildings
66
Figure 6.1: Virtual 3D environment explored by the participants.
have open entrances. The rooms within the building are marked with
shapes of certain colors that serve as landmarks. The participants are
provided helpful suggestions such as asking them to find their way to a
central location such as the fountain from where both the open buildings
are visible. After the exploration phase, the subjects are trained on a series
of pathfinding problems involving a pair of landmarks. These are either
both within the same building or in separate buildings. When they are
in separate buildings, the participant must go up a scale to visualize the
path across the neighborhood.
67
Figure 6.2: The inside of a building within the virtual world. Landmarkssuch as the red star and the blue circle facilitate the creation of a spatialmap
6.2.2 Experiment Phase
The human brain has a highly developed cortex that is extremely good
at making predictions. We constantly anticipate perceptual and cognitive
occurrences. This may be further enhanced by repetition and cues. This
experiment uses a set of three images called maps (shown in figure 6.3.
These are aerial views of the neighborhood and of the two buildings. While
explaining the procedures of the experiment, we introduce the subjects to
these maps in an attempt to minimize novelty effects at the beginning of
the experiment. The experiment is carried out in 128 trial sets divided into
two epochs. A single trial set consists of a pathfinding task such as “[Find
68
Figure 6.3: From left to right: Map of Building 1, 2 and the neighborhood.The first two are locations at scale 0 while the neighborhood is at scale 1
your way from] the blue square to the green triangle” followed by 3 trials.
During a trial, the participant is shown one of the three maps and asked
to indicate whether it is helpful in visualizing the path (1 for helpful and
2 for not helpful). The three maps are shown in a random order within
each trial set. During a particular trial set, the pathfinding task is chosen
randomly from a set of 16 tasks. There are two such distinct sets, one for
each epoch. A few sample pathfinding tasks are shown in table 6.1. In
69
Map B1 Map B2 Map CMost Relevant 1 3 5Not Relevant 2 4 6
Table 6.1: The condition numbers allotted to the 6 experimental condi-tions. Each of the three maps could be identified as “most relevant” or“not relevant” by the participant. However, their response is not takeninto account in the analysis presented here.
Path MapB1 MapB2 MapCPink Circle to Red Square Relevant Not Relevant Not RelevantBlue Circle to Red Star Not Relevant Relevant Not RelevantPink Circle to Red Star Not Relevant Not Relevant Relevant
Table 6.2: The three maps and the two options (relevant or not relevant)create a total of 6 conditions
task 1 the pink circle and the red square are both in building 1 suggesting
that Map B1 is the most relevant. In task 3, the pink circle and the red
star are in separate buildings. Therefore, MapC is the most relevant. Each
pathfinding task belongs to one of six categories or conditions (summarized
in table 6.2.2). The pathfinding task serves as a contextual cue to prime
the individuals. The hypothesis is that the map corresponding to the
pathfinding task would become active such that a response is elicited in
advance of the stimulus presentation.
6.3 Results and Discussion
Six subjects participated in our study. Individuals who were on med-
ication, had consumed alcohol or drugs in the recent past, who had a
premature birth, etc. were excluded from the study. They were all be-
tween the ages of 20 and 25 and were students in the five-college area. We
70
rejected the ERP data for one subject as it contained too many artifacts
such as eye blinks and muscle movement. The waveforms shown in figure
6.4 was recorded from the electrodes above the occipital lobe where visual
information is processed. The waveforms peak around 100ms and another
around 220ms. The second peak is of greater interest since it occurred after
the purely visual response and thus may reflect a higher level conceptual
response.
The waveforms were imported into excel and then into SPSS for sta-
tistical analysis. There was a significant within-subject effect, i.e., the
participants’ evoked potentials were significantly different between the dif-
ferent conditions (Huynh-Feldt measure, p < 0.05). A Paired samples T
test was conducted (see figure 6.5) and significant differences were found
between Pair 1 (comparing conditions 1 and 2) (t=-3.301, p=0.03). The
evoked potentials to the Map B1 was significantly different between the
conditions where it was relevant and not relevant to the pathfinding task.
This indicates that the pathfinding task did influence the mental state
of the participants while we recorded their neural/cognitive response to
the image. As expected, there was no difference at all between conditions
2 and 4 (t=-0.408, p=0.704), when the maps B1 and B2 were both ir-
relevant. We know that map C contained the entire neighborhood and
is therefore always relevant to some extent. This is also empirically ob-
served as no significant difference is observed between conditions 5 and 6
(t=0.616, p=0.571). However, a comparison of conditions 1 and 6 shows
that the evoked ERPs when the map B1 were relevant was significantly
different from those evoked when the map C was not relevant (t=-2.821,
71
p=0.048). This suggests that although C was relevant to a greater or lesser
extent in all the pathfinding tasks, its activation may have been suppressed
when Map B1 was more relevant. When one entity exerts positive inhi-
bition on another, it points to the possibility that they are a part of the
same circuitry. This is an exploratory study with five subjects and not
very many trials each subject. The evidence in support of the hierarchical
reasoning paradigm is encouraging. We expect to see greater differences
in the conditions in a follow-up study with more subjects.
72
Figure 6.4: Grand Average ERP Waveforms: These waveforms were cre-ated by averaging together the averaged waveforms of the individual sub-jects across the different trials. These recordings are from the corticalareas above the occipital lobe, the visual processing region.
73
Figure 6.5: The Paired Samples T test is used to compare the conditionspair-wise. Significant differences were found for Pair 1 and Pair 7 (p <0.05). Refer to table 6.1 for the experimental conditions corresponding tothe condition numbers.
Chapter 7
Conclusions and Future Work
To recapitulate, this thesis presents a novel model of hierarchical spatial
reasoning using Holographic Reduced Representations. The nodes of the
tree are content addressable which allows this framework to support effi-
cient hierarchical search. This avoids the problem of explosion in the cog-
nitive load that hounds some other hierarchical models. The distributist
framework affords many advantages to this model such as automatic gen-
eralization, robustness in case of damaged or corrupted neural codes and
graceful degradation with increase in the number of locations. The bind-
ing operation is flexible owing to its useful properties of commutativity,
associativity, distributivity, stochasticity and self-inverse. This model has
many desirable properties such as fixed size of recursively composed neu-
ral codes, efficient path finding and scalability to large environments. For
all these reasons it is an attractive cognitive model of hierarchical spatial
reasoning.
This model of path finding captures the spirit of Hinton’s reduced de-
74
75
scription (shown below) [Hinton, 1990] although it is not an exact imple-
mentation.
1. Representational adequacy: The vector representing the loca-
tions (reduced description) cannot be used directly to retrieve all
the compositional keys and locations. However it can be queried to
retrieve information regarding the constituting locations.
2. Reduction: The full description of a location involves many keys
and locations at smaller scales. The reduced description is the same
size as any one of these constituents and thus satisfies the necessity
for reduction.
3. Systematicity: The reduced description for locations are system-
atically related to its constituting keys and locations, as described
by the compositional rules in Chapters 3 and 4.
4. Informativeness: X ⊗ Y will return a high similarity in the packed
hierarchy memory iff X is a location contained within Y at any hi-
erarchical level. Thus the locations are explicit in that they support
immediate accessibility. It is possible to probe a location once to
obtain information about smaller locations at any hierarchical level.
I would like to improve upon the model to support learning of represen-
tations using external data. I’m also interested in exploring an extension of
this model to more general environments and to three dimensions. Often,
the same environment can have competing hierarchical representations. I
Appendix A
Derivation of δ(X,A)
δ(X,A) is the probability that the corresponding bits of X and A are dif-
ferent. Let X = [A + B + ... + D(+R)]. Where A,B . . . ,D are K binary
vectors. The square brackets stand for the thresholding operation - each
bit of the vector X will be 0 or 1 depending on which of them appears more
frequently in that position among the K vectors. (In case K is even, a ran-
dom vector is added to X in case to make K odd). This section presents
the proof for δ(X,A) = 12− 1
2K
(K−1K−1
2
)For K= 1, δ(X,A) = 0. For K= 3, one out of the four combinations
(shown below) is such that A is 0 and X is 1, hence δ(X,A) = 0.25.
The following derivation is applicable to cases where K > 1. Let us
consider a bit at a particular location in the vectors A,B,D,R,X. Let m
be all possible combinations where A is 0 and X is 1 and n = all possible
combinations where A is 0.
δ(X,A) =m
n
77
78
For K= 1,δ(X,A) = 0, the total number of possible combinations where
A is 0 can be found by finding all the combinations of the remaining K-1
bits. This is given by the sum of all the elements of the (K − 1)th row of
the Pascal’s triangle. Hence we find the number of possible combinations
where A is 0 as: 2K−1
Since X is 1, there must be at least K+12
of the K − 1 bits must be 1
(since A is 0). Hence, to find all possible combinations where A=0 and
X=1, we consider all combinations such that exactly K+12
are 1, exactly
K+32
bits are 1, and so on.
Let(K−1K+1
2
)+(K−1K+3
2
)+ . . .+
( K−1K+2K−3
2 +1
2
)= x. Then 2x+
(K−1K−1
2
)= sum of all
elements of row (K − 1) of Pascals triangle = 2K−1
⇒ x = δ(X,A) =2K−1
2− 1
2
(K − 1K−12
)(A.1)
Using n= 2K−1 and (A.1) to get the expression for delta:
δ(X,A) =δ(X,A) = 2K−1
2− 1
2
(K−1K−1
2
)2K−1
⇒ δ(X,A) =1
2− 1
2K
(K − 1K−12
)
Appendix B
Pathfinding Example
Figure B.1: Spatial hierarchy depicted in graph-form. In this example,there are 18 locations at the most refined scale, which are grouped into 6locations at scale 1, 2 locations at scale 2 and 1 location at scale 3
B.1 Binary Framework Example
Locations at scale n in the location hierarchy is denoted by λ(n− i), where
n denotes the scale and i denotes an index value for the location at the
scale. Hence, leaf nodes are represented as λ(0 − i). Leaf nodes are clus-
79
80
tered to obtain scale 1 locations such as λ(1 − 1), λ(1 − 2) etc which are
further clustered into scale 2 locations such as λ(2 − 1) and λ(2 − 2). In
the example shown above the locations at scale 1 are generated from the
locations vectors as:
λ(1− 1) = κ(1− 1)⊗ (λ(0− 1) + λ(0− 2) + λ(0− 3) + λ(0− 4))
λ(1− 2) = κ(1− 2)⊗ (λ(0− 5) + λ(0− 6) + λ(0− 7) + λ(0− 8))
λ(1− 3) = κ(1− 3)⊗ (λ(0− 9) + λ(0− 10) + λ(0− 11) + λ(0− 12))
λ(1− 4) = κ(1− 4)⊗ (λ(0− 13) + λ(0− 14) + λ(0− 15) + λ(0− 16))
λ(1− 5) = κ(1− 5)⊗ (λ(0− 17) + λ(0− 18) + λ(0− 19) + λ(0− 20))
λ(1− 6) = κ(1− 6)⊗ (λ(0− 21) + λ(0− 22) + λ(0− 23) + λ(0− 24))
A location at scale 2 such as λ(2− 1) contains 3 scale 1 and 6 scale 1 lo-
cations which implies that k = 3 and m = 6. Using the expression above,
we get the following scale 2 locations:
λ(2− 1) = κ(2− 1)⊗ (λ(1− 1) + λ(1− 2) + λ(1− 3))
λ(2− 2) = κ(2− 2)⊗ (λ(1− 4) + λ(1− 5) + λ(1− 6))
A scale 3 location contains two scale 2 locations and there are a total of
two scale 2 locations giving us one location at scale 3:
λ(3− 1) = κ(3− 1)⊗ (λ(2− 1) + λ(2− 2))
Let us randomly assign λ(0 − 1) as the start location and λ(0 − 17)
as the goal location. We begin with the knowledge of the state vectors
corresponding to the start and the goal locations: µ(0− 1), λ(0− 1), and
µ(0−17), λ(0−17). We probe the current location λ(0−1) with µ(0−17).
The packed hierarchy memory is queried with µ(0 − 17) ⊗ λ(0 − 1) =
µ(0 − 17) ⊗ κ(0 − 1) ⊗ µ(0 − 1). This does not return a high similarity
81
value. So we go to the next higher scale. The current location at the next
scale is retrieved in two steps:
1. The packed hierarchy memory is queried with the current location
λ(0−1). The following item in the memory will have a high similarity
and will be returned:
PH = λ(0− 1) + λ(0− 1)⊗ λ(1− 1) + λ(0− 1)⊗ λ(1− 1)⊗ λ(2−
1) + λ(0− 1)⊗ λ(1− 1)⊗ λ(2− 1)⊗ λ(3− 1)
2. The location at the next scale is retrieved by querying the Location
memory with
λ(0− 1)⊗ PH = λ(0− 1)⊗ (λ(0− 1) + λ(0− 1)⊗ λ(1− 1)
+ λ(0− 1)⊗ λ(1− 1)⊗ λ(2− 1) + λ(0− 1)⊗
λ(1− 1)⊗ λ(2− 1)⊗ λ(3− 1))
= 0 + λ(1− 1) + λ(1− 1)⊗ λ(2− 1)
+ λ(1− 1)⊗ λ(2− 1)⊗ λ(3− 1)
∼ λ(1− 1)
The location at the next scale — λ(1 − 1) will be returned with a
high similarity in the location memory.
Now, the packed hierarchy memory is queried using µ(0− 17):
82
µ(0− 17)⊗ λ(1− 1) = µ(0− 17)⊗ (κ(1− 1)⊗ (µ(0− 1) + µ(0− 2)
+ µ(0− 3) + µ(0− 4)))
No item in the packed hierarchy memory has a high similarity to this
expression. Hence, we expand to the next scale λ(2−1) and probe it using
µ(0− 17) again:
µ(0− 17)⊗ λ(2− 1) = µ(0− 17)⊗ (κ(2− 1)⊗ (λ(1− 1) + λ(1− 2) + λ(1− 3)))
= µ(0− 17)⊗ (κ(2− 1)⊗ (κ(1− 1)⊗ (λ(0− 1) + λ(0− 2)
+ λ(0− 3) + λ(0− 4)) + κ(2− 1)⊗ (κ(1− 2)⊗
(λ(0− 5) + λ(0− 6) + λ(0− 7) + λ(0− 8))) + κ(2− 1)⊗
(κ(1− 3)⊗ (λ(0− 9) + λ(0− 10) + λ(0− 11) + λ(0− 12)))
Once again none of the terms will be recognizable in the packed hier-
archy memory so we expand to the next scale λ(3 − 1) which is probed
with the goal.
83
µ(0− 17)⊗ λ(3− 1) = µ(0− 17)⊗ (κ(3− 1)⊗ (λ(2− 1) + λ(2− 2)))
= µ(0− 17)⊗ (κ(3− 1)⊗ (κ(2− 1)⊗ (λ(1− 1) + λ(1− 2)
+ λ(1− 3))
+ κ(3− 1)⊗ (κ(2− 2)⊗ (λ(1− 4) + λ(1− 5) + λ(1− 6)))
= µ(0− 17)⊗ (κ(3− 1)⊗ (κ(2− 1)⊗ λ(1− 1) + κ(2− 1)
⊗ λ(1− 2) + κ(2− 1)⊗ λ(1− 3))
+ κ(3− 1)⊗ (κ(2− 2)⊗ λ(1− 4) + κ(2− 2)⊗ λ(1− 5)
+ κ(2− 2)⊗ λ(1− 6))
(Distributivity of ⊗ over merge)
= µ(0− 17)⊗ κ(3− 1)⊗ κ(2− 1)⊗ λ(1− 1) + µ(0− 17)⊗ κ(3− 1)
⊗ κ(2− 1)⊗ λ(1− 2)
+ µ(0− 17)⊗ κ(3− 1)⊗ κ(2− 1)⊗ λ(1− 3) + µ(0− 17)
⊗ κ(3− 1)⊗ κ(2− 2)⊗ λ(1− 4)
+ µ(0− 17)⊗ κ(3− 1)⊗ κ(2− 2)⊗ λ(1− 5) + µ(0− 17)
⊗ κ(3− 1)⊗ κ(2− 2)⊗ λ(1− 6)
(Distributivity of ⊗ over merge)
84
This expression contains the term:
µ(0− 17)⊗ κ(3− 1)⊗ κ(2− 2)⊗ λ(1− 5)
= µ(0− 17)⊗ κ(3− 1)⊗ κ(2− 2)⊗ (κ(1− 5)⊗ λ(0− 17)
+ κ(1− 5)⊗ λ(0− 18) + κ(1− 5)⊗ λ(0− 19) + κ(1− 5)⊗ λ(0− 20))
= µ(0− 17)⊗ κ(3− 1)⊗ κ(2− 2)⊗ κ(1− 5)⊗ λ(0− 17)
+ µ(0− 17)⊗ κ(3− 1)⊗ κ(2− 2)⊗ κ(1− 5)⊗ λ(0− 18)
+ µ(0− 17)⊗ κ(3− 1)⊗ κ(2− 2)⊗ κ(1− 5)⊗ λ(0− 19)
+ µ(0− 17)⊗ κ(3− 1)⊗ κ(2− 2)⊗ κ(1− 5)⊗ λ(0− 20)
Since µ(0 − 17) ⊗ λ(0 − 17) = κ(0 − 17), hence, the above expression
simplifies to:
= κ(3− 1)⊗ κ(2− 2)⊗ κ(1− 5)⊗ κ(0− 17)
+ µ(0− 17)⊗ κ(3− 1)⊗ κ(2− 2)⊗ κ(1− 5)⊗ λ(0− 18)
+ µ(0− 17)⊗ κ(3− 1)⊗ κ(2− 2)⊗ κ(1− 5)⊗ λ(0− 19)
+ µ(0− 17)⊗ κ(3− 1)⊗ κ(2− 2)⊗ κ(1− 5)⊗ λ(0− 20)
The expression κ(3−1)⊗κ(2−2)⊗κ(1−5)⊗κ(0−17) is composed solely
of scale keys and is recognizable in the packed hierarchy memory which
returns the following expression:
PH17 = κ(0 − 17) ⊗ κ(1 − 5) + κ(0 − 17) ⊗ κ(1 − 5) ⊗ κ(2 − 2) + κ(0 −
17)⊗ κ(1− 5)⊗ κ(2− 2)⊗ κ(3− 1)
The goal is found at scale 3 – the smallest scale containing both the goal
85
and the start location. Using the goal key, κ(0 − 17) and PH17 obtained
in the last step, we can recursively retrieve the locations keys (at different
scales) associated with the goal. This retrieval is accomplished as follows.
1. Generate a query expression; κ(0− 17)⊗ PH17
2. Use this expression, query the Location memory. We retrieve the
scale 1 location key for the goal. In this example, we will get back
κ(1− 5).
3. The next query expression is generated as κ(0−17)⊗κ(1−5)⊗PH17.
4. Query the Location memory with this returns the next higher key -
κ(2− 2).
In this way, recursive goal retrieval allows us to find the different scales
at which the goal is present. If the goal was found at scale n, the expansion
operation involves n-2 queries to the Location memory. The final step
in retrieving the path involves using the keys to find the location code
associated with the location keys. The location memory is queried using
the keys κ(1−5) and κ(2−2), which will return the expressions κ(1−5)+
κ(1−5)⊗µ(1−5) and κ(2−2)+κ(2−2)⊗µ(2−2) respectively. By probing
these expressions with the corresponding keys, we obtain 0 +µ(1− 5) and
0 + µ(2− 2). Hence, we have retrieved all the nodes that are ancestors to
the goal node.
Thus we arrive at the route to the goal: µ(3 − 1) → µ(2 − 2) →
µ(1 − 15). This path may be a novel route that was never traversed but
86
arrived at by associative recall of stored locations. This is made possible
by computational properties of the cognitive map proposed here.
B.2 Continuous Framework Example
Let there be 18 locations λ(0 − 1) to λ(0 − 18) at the lowest scale of the
hierarchy which are grouped into six scale 1 locations which in turn are
clustered as two locations at scale 2. These two locations together are
represented as a scale 3 location (the terms ”node” and ”location” are
used interchangeably here).
Figure B.2: Spatial hierarchy depicted in graph-form. In this example,there are 18 locations at the most refined scale, which are grouped into 6locations at scale 1, 2 locations at scale 2 and 1 location at scale 3
87
λ(1− 1) = λ(0− 1) ~ λ(0− 1) + λ(0− 2) ~ λ(0− 2) + λ(0− 3) ~ λ(0− 3)
λ(1− 2) = λ(0− 4) ~ λ(0− 4) + λ(0− 5) ~ λ(0− 5) + λ(0− 6) ~ λ(0− 6)
λ(1− 3) = λ(0− 7) ~ λ(0− 7) + λ(0− 8) ~ λ(0− 8) + λ(0− 9) ~ λ(0− 9)
λ(1− 4) = λ(0− 10) ~ λ(0− 10) + λ(0− 11) ~ λ(0− 11)
+ λ(0− 12) ~ λ(0− 12)
λ(1− 5) = λ(0− 13) ~ λ(0− 13) + λ(0− 14) ~ λ(0− 14)
+ λ(0− 15) ~ λ(0− 15)
λ(1− 6) = λ(0− 16) ~ λ(0− 16) + λ(0− 17) ~ λ(0− 17)
+ λ(0− 18) ~ λ(0− 18)
The scales at scale 2 are computed as:
λ(2− 1) = λ(1− 1) ~ λ(1− 1) + λ(1− 2) ~ λ(1− 2) + λ(1− 3) ~ λ(1− 3)
λ(2− 2) = λ(1− 4) ~ λ(1− 4) + λ(1− 5) ~ λ(1− 5) + λ(1− 6) ~ λ(1− 6)
Finally, the location at scale 3 is given by:
λ(3− 1) = λ(2− 1) ~ λ(2− 1) + λ(2− 2) ~ λ(2− 2)
Without loss of generality, let us assign λ(0 − 1) as the start location
and λ(0− 17) as the goal location.
1. If the current (start) location is the goal location then search ends
88
here (trivial case).
2. Else, we retrieve the location at the next higher scale, a.k.a. the
parent node.
3. We check whether this is the goal location and if it is not found, step
2 is repeated until the goal is found at some scale. Subsequently, the
ancestors of the goal node are retrieved.
The current location at the next scale is retrieved in two steps:
1. The packed hierarchy (PH) memory is queried with the current lo-
cation λ(0− 1). The following item in the memory will have a high
similarity and will be returned:
PH = λ(0− 1) + λ(0− 1) ~ λ(1− 1) + λ(0− 1) ~ λ(1− 1) ~ λ(2−
1) + λ(0− 1) ~ λ(1− 1) ~ λ(2− 1) ~ λ(3− 1)
2. Its parent is retrieved by querying the Location memory with λ(0−
1) ~ PH:
∼ λ(0− 1) ~ (λ(0− 1) + λ(0− 1) ~ λ(1− 1) + λ(0− 1) ~ λ(1− 1)
~ λ(2− 1) + λ(0− 1) ~ λ(1− 1) ~ λ(2− 1) ~ λ(3− 1))
∼ 0 + λ(1− 1) + λ(1− 1) ~ λ(2− 1) + λ(1− 1) ~ λ(2− 1)
~ λ(3− 1) λ(1− 1)
The location at the next scale — λ(1 − 1) will be returned with a
high similarity in the location memory.
89
Now, the packed hierarchy memory is queried using λ(0− 17)T :
λ(0− 17)T ~ λ(1− 1) = λ(0− 17)T ~ (λ(0− 1) ~ λ(0− 1) + λ(0− 2)
~ µ(0− 2) + λ(0− 3) ~ λ(0− 3) + λ(0− 4) ~ λ(0− 4)))
No item in the packed hierarchy memory has a high similarity to this
expression. Hence, we expand to the next scale λ(2−1) and probe it using
λ(0− 17) again:
λ(0− 17)T ~ λ(2− 1) ∼ λ(0− 17)T ~ (λ(1− 1) ~ λ(1− 1)
+ λ(1− 2) ~ λ(1− 2) + λ(1− 3) ~ λ(1− 3))
∼ λ(0− 17)T ~ (λ(1− 1) ~ (λ(0− 1) ~ λ(0− 1)
+ λ(0− 2) ~ λ(0− 2) + λ(0− 3) ~ λ(0− 3)
+ λ(0− 4) ~ λ(0− 4))
+ λ(1− 2) ~ (λ(0− 5) ~ λ(0− 5) + λ(0− 6) ~ λ(0− 6)
+ λ(0− 7) ~ λ(0− 7) + λ(0− 8) ~ λ(0− 8))
+ λ(1− 3) ~ (λ(0− 9) ~ λ(0− 9) + λ(0− 10) ~ λ(0− 10)
+ λ(0− 11) ~ λ(0− 11) + λ(0− 12) ~ λ(0− 12)))
Once again none of the terms will be recognizable in the packed hier-
archy memory so we expand to the next scale λ(3 − 1) which is probed
90
with the goal.
λ(0− 17)T ~ λ(3− 1) ∼ λ(0− 17)T ~ (λ(2− 1) ~ λ(2− 1) + λ(2− 2) ~ λ(2− 2))
∼ λ(0− 17)T ~ (λ(2− 1) ~ (λ(1− 1) ~ λ(1− 1)
+ λ(1− 2) ~ λ(1− 2) + λ(1− 3) ~ λ(1− 3))
+ λ(2− 2) ~ (λ(1− 4) ~ λ(1− 4)
+ λ(1− 5) ~ λ(1− 5) + λ(1− 6) ~ λ(1− 6))
This expression contains the term:
λ(0− 17)T ~ λ(2− 2) ~ (λ(1− 4) ~ λ(1− 4) + λ(1− 5) ~ λ(1− 5)
+ λ(1− 6) ~ λ(1− 6))
The properties of distributivity of ~ over merge and commutativity of ~
allow us to further manipulate this expression. It contains the following
term:
λ(0− 17)T~λ(2− 2) ~ λ(1− 6) ∼ λ(0− 17)T ~ λ(2− 2) ~ λ(1− 6) ~ (λ(0− 16)
~ λ(0− 16) + λ(0− 17) ~ λ(0− 17) + λ(0− 18) ~ λ(0− 18))
∼ λ(0− 17)T ~ λ(1− 6) ~ λ(2− 2) ~ λ(0− 16) ~ λ(0− 16)
+ λ(0− 17)T ~ λ(1− 6) ~ λ(2− 2) ~ λ(0− 17) ~ λ(0− 17)
+ λ(0− 17)T ~ λ(1− 6) ~ λ(2− 2) ~ λ(0− 18) ~ λ(0− 18)
Using λ(0−17)T ~λ(0−17)~X ∼ X, and the commutative property,
91
the above expression simplifies to:
∼ λ(0− 17)T ~ λ(1− 6) ~ λ(2− 2) ~ λ(0− 16) ~ λ(0− 16)
+λ(0− 17) ~ λ(1− 6) ~ λ(2− 2)
+λ(0− 17)T ~ λ(1− 6) ~ λ(2− 2) ~ λ(0− 18) ~ λ(0− 18)
The expression λ(0 − 17) ~ λ(1 − 6) ~ λ(2 − 2) does not contain any
auto-convolution terms. It is recognizable in the packed hierarchy memory
which returns the following expression:
PH17 = λ(0− 17) + λ(0− 17)~ λ(1− 6) + λ(0− 17)~ λ(1− 6)~ λ(2− 2)
The goal is found at scale 3 – the smallest scale containing both the goal
and the start location. Using the goal location, λ(0 − 17) and PH17 ob-
tained in the last step, we can recursively retrieve the locations at different
scales associated with the goal. This retrieval is accomplished as follows.
1. Generate the query expression; λ(0− 17) ~ PH17
2. Using this expression, query the Location memory to retrieve the
scale 1 goal node. In this example, we will retrieve λ(1− 5).
3. The next query expression is generated as λ(0−17)~λ(1−5)~PH17.
4. Query the Location memory to retrieve the next higher key - λ(2−2).
In this way, recursive goal retrieval allows us to find the ancestors of
the goal node. If the goal was found at scale n, the expansion operation
involves n− 2 queries to the Location memory.
92
B.3 Dimensionality
1000 2000 3000 4000 5000 6000 7000 8000 9000 100000.495
0.5
0.505
0.51
0.515
0.52
Sim
ilari
ty V
alu
es
Dimension of State Vector
Relationship of Accuracy and Con!dence values to dimension
AccuracyCon!dence
Figure B.3: Relationship of Accuracy and Confidence values to Dimen-sionality
The sensitivity of the pathfinding process to the number of units in the
distributed representation was examined. Since the number of possible n-
bit binary distributed representations are 2n, increasing the dimensionality
of our state vectors does not have a significant effect on the accuracy and
confidence values of pathfinding.
Bibliography
[Awh and Jonides, 2001] Awh, E. and Jonides, J. (2001). Overlappingmechanisms of attention and spatial working memory. Trends in Cog-nitive Sciences, 5(3):119–126.
[Awh et al., 1998] Awh, E., Jonides, J., Reuter-Lorenz, and Patricia, A.(1998). Rehearsal in spatial working memory. Journal of ExperimentalPsychology: Human Perception and Performance, 24(3):780–790.
[Car and Frank, 1994] Car, A. and Frank, A. (1994). General principlesof hierarchical spatial reasoning - the case of wayfinding. Proceedings ofthe Conference on Spatial Data Handling.
[Cohn et al., 2002] Cohn, A. G., Magee, D. R., Galata, A., Hogg, D. C.,and Hazarika, S. M. (2002). Towards an architecture for cognitive visionusing qualitative spario-temporal representations and abduction. In InSpatial Cognition III, pages 232–248. Springer-Verlag.
[Dolins and Mitchell, 2010] Dolins, F. L. and Mitchell, R. W. (2010). Spa-tial Cognition, Spatial Perception. Cambridge University Press., NewYork, NY, USA.
[Essen and Maunsell, 1983] Essen, D. C. V. and Maunsell, J. H. (1983).Hierarchical organization and functional streams in the visual cortex.Trends in Neurosciences, 6:370 – 375.
[Etienne and Jeffery, 2004] Etienne, A. S. and Jeffery, K. J. (2004). Pathintegration in mammals. Hippocampus, 14(2):180–192.
[Felleman and Van Essen, 1991] Felleman, D. J. and Van Essen, D. C.(1991). Distributed hierarchical processing in the primate cerebral cor-tex. Cerebral Cortex, 1(1):1–47.
[Hawkins and Blakeslee, 2004] Hawkins, J. and Blakeslee, S. (2004). OnIntelligence. Henry Holt and Company.
93
94
[Hinton, 1989] Hinton, G. E. (1989). Parallel Models of Associative Mem-ory. L. Erlbaum Associates Inc., Hillsdale, NJ, USA.
[Hinton, 1990] Hinton, G. E. (1990). Mapping part-whole hierarchies intoconnectionist networks. Artif. Intell., 46:47–75.
[Hopfield, 2006] Hopfield, J. J. (2006). Neural networks and physical sys-tems with emergent collective computational abilities. Proceedings ofThe National Academy of Sciences.
[Hopfield, 2010] Hopfield, J. J. (2010). Neurodynamics of mental explo-ration. Proceedings of the National Academy of Sciences, 107(4):1648–1653.
[Huang et al., 1997] Huang, Y.-W., Jing, N., and Rundensteiner, E.(1997). A hierarchical path view model for path finding inintelligent transportation systems. GeoInformatica, 1:125–159.10.1023/A:1009784527790.
[Kanerva, 1993] Kanerva, P. (1993). Associative Neural Memories: The-ory and Implementation, chapter Sparse Distributed Memory and Re-lated Models, pages 50–76. Number 3. Oxford University Press, NewYork, NY, USA.
[Kanerva, 1997] Kanerva, P. (1997). Fully distributed representation. Pro-ceedings of 1997 Real World Computing Symposium (Tokyo, Japan),pages 358–365.
[Kanerva, 1998] Kanerva, P. (1998). Dual role of analogy in the designof a cognitive computer. Advances in Analogy Research: Integration ofTheory and Data from the Cognitive, Computational, and Neural Sci-ences.
[Kubie and Fenton, 2008] Kubie, J. L. and Fenton, A. A. (2008). Heading-vector navigation based on head-direction cells and path integration.Hippocampus, 9999(9999).
[Li, 1989] Li, J. (1989). Analysis and synthesis of a class of neural net-works: linear system operating on a closed hypercube. IEEE Transac-tions on Circuits and Systems, 36:1405–1422.
[Luck, 2005] Luck, S. J. (2005). An introduction to the event-related po-tential technique. MIT Press, Cambridge, MA.
95
[McLeod et al., 1998] McLeod, P., Plunkett, K., and Rolls, E. T. (1998).Introduction to Connectionist Modeling of Cognitive Processes. OxfordUniversity Press, New York, NY, USA.
[Mollison, 2005] Mollison, M. (2005). Event-related potentials in humansduring spatial navigation. PhD thesis, Brandeis University, Waltham,MA, USA.
[O’Keefe and Nadel, 1978] O’Keefe, J. and Nadel, L. (1978). The hip-pocampus as a cognitive map. Clarendon Press.
[Plate, 2002] Plate, T. (2002). Distributed Representations. MacmillianEncyclopedia of Cognitive Science.
[Plate, 2003] Plate, T. (2003). Holographic reduced representation: dis-tributed representation for cognitive structures. CSLI lecture notes. CSLIPublications.
[Plate, 1991] Plate, T. A. (1991). Convolution algebra for compositionaldistributed representations. Proceedings of the 12th International JointConference on Artificial Intelligence, pages 30–35.
[Plate, 1995] Plate, T. A. (1995). Holographic reduced representations.IEEE Transactions on Neural Networks, 6(3).
[Plate, 2000] Plate, T. A. (2000). Analogy retrieval and processing withdistributed vector representations. Expert Systems: The InternationalJournal of Knowledge Engineering and Neural Networks, Special Issueon Connectionist Symbol Processing, 17(1):29–40.
[Rachkovskij, 2001] Rachkovskij, D. A. (2001). Representation and pro-cessing of structures with binary sparse distributed codes. IEEE Trans.on Knowl. and Data Eng., 13:261–276.
[Reese and Lipsitt, 1975] Reese, H. W. and Lipsitt, L. P. (1975). Advancesin child development and behavior. Academic Press Inc.
[Rohrbein et al., 2003] Rohrbein, F., Schill, K., Baier, V., Stein, K., Zet-zsche, C., and Brauer, W. (2003). Spatial cognition iii. chapter Motionshapes: empirical studies and neural modeling, pages 305–320. Springer-Verlag, Berlin, Heidelberg.
[Tolman, 1948] Tolman, E. C. (1948). Cognitive maps in rats and men.Psychological Review, 55(4):189–208.
Recommended