25
Accepted for publication in the IEEE Transactions on Neural Networks (Date of acceptance January 1997) CHARACTERISTICS OF MULTIDIMENSIONAL HOLOGRAPHIC ASSOCIATIVE MEMORY IN RETRIEVAL WITH DYNAMICALLY LOCALIZABLE ATTENTION Javed I. Khan & D. Y. Yun Contact: Laboratories of Intelligent and Parallel Systems Department of Electrical Engineering 493 Holmes Hall, 2540 Dole Street University of Hawaii at Manoa HI-96822, USA Phone: (808)-956-3868 Fax: (808)-941-1399 [email protected] SUMMARY This paper presents the performance analysis (capacity and retrieval accuracy) of Multidimensional Holographic Associative Memory (MHAC). MHAC has the unique ability to retrieve pattern-associations with changeable attention. In attention actuated retrieval the user can dynamically select any subset of the elements in the example query pattern and expect the memory to confine its associative match only within the specified field of attention. Existing artificial associative memories lack this ability. Also most of these models need at least 50% of bits in the input pattern to be correct for successful retrieval. MHAC, with the unique ability of localizable attention, can retrieve information correctly even with cues as small as 10% of the query frame. This paper investigates the performance of MHAC in attention actuated retrieval both analytically and experimentally. Besides confirmation, the experiments also identify an operational range space (ORS) for this memory within which various attention based applications can be built with a performance guarantee.

Accepted for publication in the IEEE Transactions on ...medianet.kent.edu/publications/JTNN96DL-theory-KY.pdf1967), self-organization (Kohonen 1987, Oja 1982), autonomous unsupervised

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Accepted for publication in the IEEE Transactions on ...medianet.kent.edu/publications/JTNN96DL-theory-KY.pdf1967), self-organization (Kohonen 1987, Oja 1982), autonomous unsupervised

Accepted for publication in theIEEE Transactions on Neural Networks

(Date of acceptance January 1997)

CHARACTERISTICS OF MULTIDIMENSIONALHOLOGRAPHIC ASSOCIATIVE MEMORY IN RETRIEVAL WITH

DYNAMICALLY LOCALIZABLE ATTENTION

Javed I. Khan & D. Y. Yun

Contact:Laboratories of Intelligent and Parallel Systems

Department of Electrical Engineering493 Holmes Hall, 2540 Dole Street

University of Hawaii at ManoaHI-96822, USA

Phone: (808)-956-3868Fax: (808)[email protected]

SUMMARY

This paper presents the performance analysis (capacity and retrieval accuracy) of Multidimensional HolographicAssociative Memory (MHAC). MHAC has the unique ability to retrieve pattern-associations with changeable attention.In attention actuated retrieval the user can dynamically select any subset of the elements in the example query patternand expect the memory to confine its associative match only within the specified field of attention. Existing artificialassociative memories lack this ability. Also most of these models need at least 50% of bits in the input pattern to becorrect for successful retrieval. MHAC, with the unique ability of localizable attention, can retrieve information correctlyeven with cues as small as 10% of the query frame. This paper investigates the performance of MHAC in attentionactuated retrieval both analytically and experimentally. Besides confirmation, the experiments also identify anoperational range space (ORS) for this memory within which various attention based applications can be built with aperformance guarantee.

Page 2: Accepted for publication in the IEEE Transactions on ...medianet.kent.edu/publications/JTNN96DL-theory-KY.pdf1967), self-organization (Kohonen 1987, Oja 1982), autonomous unsupervised

CHARACTERISTICS OF MULTIDIMENSIONALHOLOGRAPHIC ASSOCIATIVE MEMORY IN RETRIEVAL WITH

DYNAMICALLY LOCALIZABLE ATTENTION

Javed I. Khan & D. Y. Yun

Laboratories of Intelligent and Parallel SystemsElectrical Engineering Department

University of Hawaii at [email protected]

ABSTRACTThis paper presents the performance analysis (ca-

pacity and retrieval accuracy) of MultidimensionalHolographic Associative Memory (MHAC). MHAC hasthe unique ability to retrieve pattern-associations withchangeable attention. In attention actuated retrieval theuser can dynamically select any subset of the elements inthe example query pattern and expect the memory toconfine its associative match only within the specifiedfield of attention. Existing artificial associative memorieslack this ability. Also most of these models need at least50% of bits in the input pattern to be correct for successfulretrieval. MHAC, with the unique ability of localizableattention, can retrieve information correctly even withcues as small as 10% of the query frame. This paperinvestigates the performance of MHAC in attentionactuated retrieval both analytically and experimentally.Besides confirmation, the experiments also identify anoperational range space (ORS) for this memory withinwhich various attention based applications can be builtwith a performance guarantee.

1. INTRODUCTIONThe modern research in distributed and parallel

models of Artificial Associative Memory (AAM) startedwith the McCulloch and Pitts’ invention of formal neuronin 1943. This invention for the first time provided a formalarchitecture for a brain like distributed processing ofinformation. It was extraordinary. Because, reinforced bythe theory of symbolic logic (Russell & Whitehead, 1910,1912, 1913), it promised universal computability andartificial realizability of almost unlimited complex sys-tems [21,24]. The optimism it sparked was followed bya vigorous and immensely productive era of research inartificial neuro-computing.

However, beginning with Rosenblatt, till todayresearchers have focused, and in many ways confinedthemselves to the perfection of the learning behavior ofthese artificial systems. During these years, increasinglymore intricate and complex properties of learning phe-nomena have been pursued in great depth. Versatility(how arbitrary complex associations can be learned),efficiency (how more patterns can be learned), learn-ability of causality (Klopf 1987), learnability of temporalrelations, learning in continuum of time (Grossberg1967), self-organization (Kohonen 1987, Oja 1982),autonomous unsupervised adaptation (Grossberg 1976,Carpenter & Grossberg, 1987) are just a few examples ofthe successes and intricacies through which research inartificial learning matured [6,15,16,22,7,2]. Surprisingly,during these enormously productive years, few attemptshad been made to examine the recollection aspects ofAAMs other than assuming a very simple model ofretrieval for all these forms of learning.

Almost all of the proposed learning models sinceMcCulloch and Pitts have been constructed on theassumption of a simple and restricted retrieval scenario1.In this scenario the sample from the content which is usedduring query is a close replica of the target. However,more complex and versatile retrieval formalism is notonly conceivable but also seems to be an integral part ofnatural associative memories. The ability to almosteffortlessly infuse attention during retrieval is one suchaspect of natural recollection.

The phenomenon is explained through an example ofimage perception. Let an associative memory be allowedto learn the image frames A, B and C of Fig-1. If duringthe retrieval, template-D is used as a sensory input, thenit is natural to expect that the memory should retrieveframe-A based on the roller. It appears to be the mostcognitively significant index in the template. However, itcan be demonstrated that most of the conventional AAMs

1 Consequentially, most of these learning methods break down when the test of learning is based on the generalizedretrieval scenario.

IEEE Transactions on Neural Networks 1

Page 3: Accepted for publication in the IEEE Transactions on ...medianet.kent.edu/publications/JTNN96DL-theory-KY.pdf1967), self-organization (Kohonen 1987, Oja 1982), autonomous unsupervised

instead, will retrieve frame-C as the closest match (indeedB and C are closer to D than A; both in least mean square(LMS), and maximum dot-product sense). The reason forsuch an unexpected result is the statistical weakness ofthe cognitively important roller pixels compared to the

statistical strength of cognitively less important back-ground pixels. In contrast, a natural memory seems to beimmune to such statistical weakness and can retrieveinformation by localizing attention on cognitivelyimportant zones.

Fig-1 Attention Modulated Retrieval

Frame-A Frame-B

Template-ETemplate-D

Frame-C

The most intriguing aspect of natural associativememory is that it can change the distribution of attentionover element space dynamically during query. Considertemplate-E. There are two objects of focus and two pos-sible answers. If desired, a natural memory can shift itsattention to any other object (for example, on the Plant)in the template and retrieve entirely different match(frame-B) apparently without any significant internalreorganization. In contrast, a conventional AAM lackssuch flexibility. For a given state of learning, it acts as adeterministic machine where each initial state flows intoa pre-determined single attractor. Conventional AAMshave no mechanism to accommodate dynamic (post-learning) change in the distribution of attention over itselement space.

A serious consequence of such attention deficiency ofconventional AAMs is their inability to work with a smallcue. A conventional AAM requires the effective cue tobe statistically significant compared to the overall patternsize. For correct retrieval, the effective cue should be atleast 50% of the pattern size for any AAM [10]. This isquite unrealistic for many applications. Interestingly,experiments performed by previous researchers containempirical evidence of such severe retrieval inadequacy ofthe existing AAM models [18, 27,8].

Khan [10] has recently demonstrated that an asso-ciative computation model called MultidimensionalHolographic Associative Computing (MHAC), based onhyperspherical representation can overcome these lim-itations. It has been demonstrated that MHAC is capableof retrieving associatively learnt information withdynamically changeable attention over the element set ofquery pattern.

The representation, learning and retrieval model ofthis memory has been derived from the principles ofHolography2. The detail of the derivation of MHAC fromholographic representation and its analyses can be foundin [10]. The paper presents the performance analysis ofthis model. This paper formally investigates the rela-tionship between degree of focus, retrieval accuracy,capacity, and scalability of this attentive memory bothanalytically and experimentally.

The following section first describes the concept ofthis attentive memory. Section 3 briefly presents thecomputational model. Section 4 then presents the detailedanalysis of performance of this model. Finally, section 5presents the result of extensive computer simulation thatempirically confirms the analytical derivations. In addi-

2 An excellent background description of holography itself has been recently published in [23].

IEEE Transactions on Neural Networks 2

Page 4: Accepted for publication in the IEEE Transactions on ...medianet.kent.edu/publications/JTNN96DL-theory-KY.pdf1967), self-organization (Kohonen 1987, Oja 1982), autonomous unsupervised

tion to the empirical observation of the critical charac-teristics of this memory, this section also presents theoperational range space (ORS). ORS can assistapplication designers in developing efficient applicationson this model by providing precise value ranges of thecritical design parameters.

2. ATTENTIVE MEMORY

2.1 Concept: Bimodal Associative MemoryA pattern is a collection of elements. Let a stimulus

and corresponding response patterns be denoted by thesymbolic vectors and

. Each of the individual ele-ments in these vectors represents a piece of information.The superscript refers to the index of the pattern and thesubscript refers to the index of theelement in it. Thevaluesof these elements correspond to a measurement obtainedby some physical sensor.

Fig-2 Information Flow Model of Bimodal Memory

A memory has three information channels (as shownin the bottom plane of Fig-2). The first is the encoderinput, where stimulus and response pattern pairs arereceived during learning. The second is the decoder input,where query stimulus pattern is received from inquirer.The third is the decoderoutput, where the response patternis generated by the memory as a reaction to the query. Aconventional associative memory processes only theabove measurement components of information in sucha way that:

Definition 1: An Associative Memory, given a set of

p stimulus pattern vectors and a set

of equal number of response pattern vectors

, learns the relationship between a

stimulus member and the corresponding response

member in such a way that, given a query pattern

, it can retrieve a pattern such that ,

and is closest to according to some matching

criterion .

An associative memory system is comprised of (i) alearning algorithm Alearn which converts all theassociations into some internal representation, (ii) aphysical storage medium and representation formalismAM to store the associations, (iii) a decoding algorithmAretrieve to recollect stored information from a givenquery stimulus , and (iv) a matching criteria tomeasure the closeness of stimulus patterns to the querypattern. The actual form of may vary between the AMmodels. In section 2.3 some pertinent forms of thisfunction has been further illustrated.

A conventional memory formalism processes onlythe measurements associated with the information ele-ments in the above model. In contrast, the conceptualmemory model of MHAC is based on a formalism whichassumes that the trust in each piece of transacted infor-mation is inherently nonconforming and measurementsassociated with the information elements are individuallysusceptible to distortion, loss, or even purposeful disre-gard. The formalism includes the meta-knowledge aboutthe state of each given piece of information(measurement) as an integral part of its basic notion ofinformation.

The proposed formalism adopts an additional meta-knowledge plane (as shown in the upper plane in Fig-2).The linguistic interpretation of the quantities of this metaplane varies depending on the channel. For the encodedinformation, this meta-knowledge corresponds to a formof assertion from the encoder. For the query pattern, itcorresponds to a form of attention on the part of inquirer.For the memory response, it corresponds to theconfidenceon the retrieved information as assessed by the memoryitself.

Formally each of the elements of information ismodeled as a bi-modal pair . Where rep-resents the measurement of the information elements and

represents the meta-knowledge associated with thismeasurement.

The above formalism, in the context of a generalmemory (irrespective of its implementation mechanism)whichcomputes on imperfect knowledge, generates somespecific expectations about the operational behavior ofthese meta quantities. These are stated below.

Expectation on the Inflow of Meta Knowledge: The

memory matching criterion should put more importance

to a piece of information that is attributed with high

SQ RR ≈ RT RT ∈ RSQ ST ∈ S

D()

{Sµ, Rµ}

RR

SQ D()Sµ = [s1µ, s2

µ, … , snµ]

Rµ = [r1µ, r2

µ, … , rmµ] D()

ATTENTION

ASSERTION

CONFIDENCE

ENCODED INFORMATION

QUERY INFORMATION RETRIEVED INFORMATION

META

PLANE

PLANE

INFORMATION

skµ = {αk

µ, βkµ} α

β

S = {Sµ | 1 ≤ µ ≤ p}

R = {Rµ | 1 ≤ µ ≤ p}Sµ ∈ S

Rµ ∈ R

IEEE Transactions on Neural Networks 3

Page 5: Accepted for publication in the IEEE Transactions on ...medianet.kent.edu/publications/JTNN96DL-theory-KY.pdf1967), self-organization (Kohonen 1987, Oja 1982), autonomous unsupervised

degree of inflow than to a piece attributed with low

in the query. The expectation can be stated as a matching

criterion:

Here N is the set of all elements in the pattern vectorand the set has cardinality n. The index variable i variesfrom 1 to n and thus the summation includes all theelements in the set N. The function dist() denotes ameasure of distance between individual pattern elements.The additional input denotes the meta-vector. From thecontext of encoding, when is specified dynamicallyduring encoding, this expectation corresponds to alearning criterion that can realizes learning withchangeable assertion (LCA). From the context of query,when is specified dynamically during query, thisexpectation corresponds to a matching criterion that canrealizes retrieval with changeable attention (RCA). It isthe later meta-knowledge on which rest of this paper willfocus.

The incorporation of meta-knowledge into the basicnotion of information goes beyond the important conceptof attention. A second symmetric expectation related tothe outflow of meta-knowledge provides completeness tothis attempt of delineating the behavior of a new memory.

Expectation on the Outflow of Meta Knowledge: If

values of query demonstrate high degree of resemblance

to the values of a priory encoded stimulus pattern, then

memory should retrieve the associated response with

higher degree of accuracy and high degree of . On the

otherhand, if it does not then it shouldgeneratearesponse

with low degree of , as detailed in Table-1.

Query Response

HIGH CLOSE HIGH CLOSER

LOW CLOSE HIGHER| CLOSERLOW

HIGH NOT LOW CLOSEST|CLOSE DON’T CARE

LOW NOT LOWER CLOSEST|CLOSE DON’T CARE

Table-1 Expectations

Inflow Expectation relates to the inward communi-cation of the meta-knowledge into the memory system.An external querying system (it can be a human user oranother computer system) supplies the stimulus elementsand the additional significance level of each stimuluselement. Outflow Expectation relates to the outwardcommunication from thememory to the external queryingsystem. In the reply, the querying system is given backnot only the retrieved measurements but also the meta-knowledge confidence about the status of the retrievedcontent. Both of the transfers are essential in the contextof imperfect knowledge transaction.

The above expectations essentially constitute thebehavioral definition of a memory system which incor-porates possibility of imperfection in the given mea-surements. In the rest of this paper such a memory willbe referred to as Attentive Memory. The next sectionpresents theactual computational modelwhichcan realizeat least one instance of the attentive memory by satisfyingthese expectations.

2.2 Dynamically Changeable AttentionIn the context of the above definition of memory the

concept of dynamic attention will now be clarified.

Definition 2: Attention refers to the fact that any

subset3 of the elements in the example query

pattern can be specified at post-learning stage as a

field of attention and the memory can confine its asso-

ciative match (by a suitable matching criteria D) only

within .

One of the most important aspects of attention basedretrieval is the dynamic specifiablity of the field ofattention. Here dynamism refers to the post-learningchangeability of the distribution of attention duringquery.

If a specific distribution of attention is given duringencoding at pre-learning stage, a conventional AAM canhard-encode it in the learned synaptic weights. However,once the learning is over, it does not allow the distributionof attention to be recast during query. For a given learning,it acts as a deterministic machine where each initial stateflows intoa pre-determined single attractor. ConventionalAAMs have no mechanism to accommodate post-learning change in the distribution of attention. Dynamicattention is equivalent to the capability of accommodatingvaried perspectives on the query pattern.

2.3 Definitions of RCA QueriesThe ability of retrieval with changeable attention is

reflected by the type of distance evaluation criteria used

β β

D SQ, S

Ta, Β = ∑

i

n

βidist αi

Q, αi

Ta …(1)

ΒΒµ

ΒQ

FQ ∈ N

SQ

α

FQ

αα

β

β

βquery αquery βresponse αresponse

3 The membership in the attention subset can be bivalued or analog. In analog membership case a particular elementcan be partial members in more than one subsets, provided that all memberships adds up to one.

IEEE Transactions on Neural Networks 4

Page 6: Accepted for publication in the IEEE Transactions on ...medianet.kent.edu/publications/JTNN96DL-theory-KY.pdf1967), self-organization (Kohonen 1987, Oja 1982), autonomous unsupervised

by any memory. In general it is possible to define thefollowing three matching criteria and corresponding RCAquery types for the memory system model defined here.

Definition 3: An Unary Attention AM (RCA type-U)

is one which retrieves a pattern , such that the

distance between its associated stimulus and the query

pattern is evaluated by a matching criterion of the form:

This definition corresponds to a matching criterionwhich considers all elements in the query pattern to beequally important from the searchers frame of reference.It converges on one of the p previously learned patterns.

If represents a subspace of the total elementspace N, then problem that an associative memory canretrieval with changeable attention can be stated in thefollowing form:

Definition 4: A Binary Attention AM (RCA type-B)

is one which retrieves a pattern , where the set

of elements in an attention vector FQ is dynamically

specifiable during query, and the distance between it’s

associated stimulus and the query pattern is evaluated by

a matching criterion of the form:

The above retrieval can be further generalized whenthe attention on a specific element is allowed to be partial.This generalized form of retrieval characterized withchangeable analog attention can be stated as follows:

Definition 5: An Analog Attention AM (RCA type-A)

is one which retrieves a pattern , where the

analog attention on the stimulus elements is represented

by the dynamically specifiable query vector,

, and . The distance

between it’s associated stimulus and the query pattern is

evaluated by a matching criterion of the form:

Here the analog attention on the stimulus elements isrepresented by an additional query vector of length n,

where .

2.4 Retrieval in Current AAM ModelsThe optimization criteria of the existing neural

models directly belong to type-U category. Models whichuse Hebbian class of learning maximize global dot-product of the patterns [17]. On the other hand, themodels which use LMS class of learning maximize globalmean square error [29]. There are few other distancemeasures also (such as entropy, maximum likelihoodratio, etc.) those have been used as the matching criterionin conventional neuro-computing. However, Hopfieldhas provided a generalized perspective to analyze thecollective behavior of a collection of interconnectedneurons irrespective of the specific function they mini-mize or maximize. He has demonstrated that the con-vergence (or recollection) behavior of a collection ofinterconnected neurons can be interpreted as aminimization of some form of energy function [8]. Thekey features to note in the energy functions of currentneural network models are (i) the set operator is asummation process , and (ii) the scope N of the setoperator is all inclusive and is based on entire elementspace, and (iii) the element distance function is onlybivariate. These properties of existing neural networkstogether make them a type-U memory.

Intuitively, the reason that conventional AAMscannot support dynamic attention is twofold. (a) First, thediscrete summation step, which is the foundation stone ofsynoptic efficacy rule (like any other finite summationprocess), requires almost all its input elements to bepresent. Although a summing output can tolerate somerandom statistical distortion of the input values, it can nottolerate selective and deliberate (full of partial) with-

ΛQ = [λ1Q, λ2

Q, … , λnQ] 0 ≤ λi

Q ≤ 1

RR ≅ RTu

D SQ, S

Ta, ΛQ = min

1 ≤ µ ≤ p

∑i

n

λiQdist(si

Q, siµ)

…(4)

D SQ, S

Tu = min

1 ≤ µ ≤ p

∑i

n

dist(siQ, si

µ)…(2)

ΛQ = [λ1Q, λ2

Q, … , λnQ] 0 ≤ λi

Q ≤ 1

FQ ⊆ N

RR ≅ RTb

D SQ, S

Tb, FQ = min

1 ≤ µ ≤ p

∑i

FQ

dist(siQ, si

µ)

…(3)M ≡ ∑

RR ≅ RTa

IEEE Transactions on Neural Networks 5

Page 7: Accepted for publication in the IEEE Transactions on ...medianet.kent.edu/publications/JTNN96DL-theory-KY.pdf1967), self-organization (Kohonen 1987, Oja 1982), autonomous unsupervised

drawal of inputs4. (b) Secondly, in a scalar (one dimen-sional) space, it is not possible to create dynamicrepresentation for the notion of ’dont-care’. The meaningof ’dont-care’ is equivalent to specifying the state of anelement which is not in the attention set . Any AAMconstructed from interconnected cells of such finite dis-crete integrators (which includes almost all of existingmodels) suffers from this fundamental limitation. In [11]it has been formally shown that:

Theorem 1: An associative memory constructed by

interconnecting cells with the scalar product rule of

synaptic transmission specified by the equation below can

not realize the retrieval of type-B, or type-A. Where, f()

is any single variate function, and sj is a real valued

number in the range I=[0,1], and the weights wij contains

the learned pattern.

A memory based on multidimensional complexrepresentation can overcome the above limitations ofconventional AAMs and can support the generalizedtype-A as well as type-B retrievals.

3. COMPUTATIONAL MODELThe computational model of the MHAC is concep-

tually based on optical holography [4,28,23]. The detailsof this derivation can be found in [10]. This section nowbriefly describes the model.

3.1 RepresentationIn this approach, each piece of information is mapped

onto a multidimensional complex number (MCN). Eachis mapped onto a set of phase elements in the range

of through a mapping transformation .Corresponding meta information is mapped as itsmagnitude through another transform 5.

Where, each element is avector inside a unit sphere in a d-dimensional sphericalspace. Each is the spherical projection (or phasecomponent) of the vector along the dimension . Thiscomputational representation will be called multidimen-sional complex numeric (MCN) representation of infor-mation.

Fig-3 Points on Hyperspherical Surface

Mappingof measurements: Aclass of functions canbe used as the mapping transform . The functionshould be single valued and continuous. For discreteinputs, continuity is required at the defined points. Adesirable characteristic of the mapping transform is thatit should maximize the symmetry at the phase domain.

Mapping of significance: Any positive valued ruleof mapping with the following two constraints can beused as . Elements with same magnitude (equi-significant) are required to contribute equally to thesubsequent decision stages. An element with magnitudezero should have no effect on the outcome of the com-puting. In addition, clipping the upper bound of themagnitude to 1.0 establishes a probabilistic interpretationof certain aspects of this representation. If all the elementsof a pattern are made equi-significant, this representationbecomes functionally equivalent to that of conventionalAAMs. However the opportunity to modify these mag-nitude values dynamically during query provides a newcapability of selective attention.

sk(λk, θ1,k, θ2,k, …, θd − 1,k)

θj ,k

i jFQ

A

B

C

D

r = f∑

i

n

wij.sj + bi

m+α()

m+β()

αk θj ,k

m+α(x)π ≥ θj ,k ≥ πβk

m+β(x)λk

sk = {αk, βk} ⇒ λke

∑j

d − 1i jθj, k

….(5)

4 Sherrington’s [25] observation on the existence of some form of integration process in the nervous sites is generallyused to rationalize the use of linear weighted sum. However still now the theory itself has not been decidedly validatedor refuted. More importantly, the weighted average suggested by us does not imply the absence of integration. Sher-rington’s theory also suggests the existence of temporal summation [9]. Recent evidence suggests that in some casestwo neurotransmitters can co-exist in axons. It is also plausible that the pre-synaptic dendrites will also have individualsaturations like any other physical channels. All of which can potentially make the summation non-linear even at thechannels.

5 The inverse transformations to revert back from MCN representation are respectively denoted by and .m−α() m−β()

IEEE Transactions on Neural Networks 6

Page 8: Accepted for publication in the IEEE Transactions on ...medianet.kent.edu/publications/JTNN96DL-theory-KY.pdf1967), self-organization (Kohonen 1987, Oja 1982), autonomous unsupervised

Combined representation: Thus, each of theinformation elements is represented as a vector boundedin the unit multidimensional spherical space. A stimuluspattern is computationally represented as:

A similar mapping on the external scalar responsefield intensities provides the response representation:

Here, the phasor represents the measurement of theretrieved response and represents the expected confi-dence (system assigned significance) on .

History of MCN representation: Use of complexnumber is not a completely new concept in artificialassociative computing, at least in 2-dimension. In 1990Sutherland [26] in his pioneering work, presented the firsttruly holographic associative memory with holographicrepresentation and learning algorithm analogous to cor-relation learning as used here. It is a 2-dimensional specialcase of the generalized multidimensional phasorrepresentation introduced here. Much of the conventionalretrieval based (RCA type-U) characteristics with the2-dimensional representation of this model have beeninvestigated in depth in this pioneering work. Morerecently Timothy Masters [20] also reported another2D-complex valued network with a learning algorithmanalogous to Backpropagation. However both of theseattempts remained focused on their network’s efficiencyissues as a conventional adaptive filter (with type-Uretrieval). The fundamentally different phenomena ofattention (type-A/B retrieval) associated with suchrepresentation [11] remained unexplored.

The first artificial system to demonstrate associativephenomena ever, optical holography itself [4], can beconsidered a complex valued computation mechanism.When pioneering researchers6, ventured to recreate suchfascinating optical transforms artificially on digitalcomputers, they adopted some simplifications to gainefficiency on digital computers. One of those earlysimplifications was the use of scalar numbers instead of2D optical wave. All subsequent research adopted thissimplified representation. Its implication was hardly everreinvestigated. In that sense this work is a visit back tothe lost dimensionality of representation; and a step

beyond. It explores further into a computational modelbased on multidimensional phasor (instead of only 2-Dphasor) representation.

3.2 EncodingIn associative memory, information is stored in the

form of associations. In the encoding process, the asso-ciation between each individual stimulus and its corre-sponding response is defined in the form of a correlationmatrix. This matrix is computed by the inner product ofthe conjugate transpose of the stimulus and the responsevectors, and is stated in equation (6).

If the stimulus is a pattern with n elements and theresponse is a pattern with m elements, then is amatrix with d-dimensional complex elements.

A suit of associations derived from a set of stimulusand corresponding response is stored in the followingcorrelation matrix X. The resulting memory substratecontaining the correlation matrix is referred to as Holo-graph.

3.3 RetrievalDuring recall, the query stimulus pattern is

represented by:

The decoding operation is performed by computingthe inner product between the excitatory stimulus and thecorrelation matrix X:

Although the above computation appears analogousto conventional associative computing paradigm, but itdisplays fundamentally different characteristics thanconventional associative computing. They process themeasurementcomponent of information quite differently.Next section explains the fundamental distinctions thatmake this new parallel and distributed computing para-digm capable of supporting type-A & B RCA search.

[Sµ] = λ1

µe

∑j

d − 1i jθj, 1

µ , λ2

µe

∑j

d − 1i jθj, 2

µ , ….λn

µe

∑j

d − 1i jθj, n

µ

[Rµ] = γ1

µe

∑j

d − 1i jφj, 1

µ , γ2

µe

∑j

d − 1i jφj, 2

µ , ….γm

µ e

∑j

d − 1i jφj, m

µ [Xµ] = [Sµ]

T⋅ [Rµ] ….(6)

[X] n × mφγ

φ

[X] = ∑µ

p

[Xµ] = ∑µ

p

[Sµ]T

[Rµ] ….(7)

[S e]

[S e] = λ1e

∑j

d − 1ijθj, 1

e , λ2e

∑j

d − 1ijθj, 2

e , …, λne

∑j

d − 1ijθj, n

e

[R e] =1c

[S e] ⋅ [X] ….(8)

where , c = ∑k

n

λk

6Following the work of Gabor, in late 60’s Willshaw started investigating the design of a distributed content-addressablememory on holographic principles [30]. In 1971 he proposed the correlograph model. However, it also used the sim-plified scalar representation instead of holographic multidimentional representation. This "correlograph" model isoften referred to as "holographic". However, in technical sense it is closer to the hebbian learning based neural networksthan the holographic memory discussed in Sutherland [26] Masters [20] and this paper.

IEEE Transactions on Neural Networks 7

Page 9: Accepted for publication in the IEEE Transactions on ...medianet.kent.edu/publications/JTNN96DL-theory-KY.pdf1967), self-organization (Kohonen 1987, Oja 1982), autonomous unsupervised

3.4 Distinction of Holographic ComputationTransfer function: The above encoding and

decoding algorithm can be realized in a distributednetwork of cells just like a conventional neuron, whereeach cell will be responsible for a simple computationwith a transfer function of the form below:

Where all dot elements are MCN instead of scalars.The transformation it realizes on the measurement com-ponent of input information is fundamentally different

from that of any existing AAM. Let, .Then, the transformation between the measurementcomponents of input and output is given by:

For comparison, the scalar product rule of synapticefficacy used by conventional AAMs is given below withequivalent notations:

This new transfer function has three characteristicsthat distinguishes itself from conventional transferfunctions. The first is that the transfer function is aweighted trigonometric (cosine) mean function, in con-trast to the conventional weighted sum. Secondly, thatthere is no explicit activation function. The thirddistinguishing feature of this cell is that all the individualsynaptic inputs have their private thresholds, rather thanhaving a single threshold at the output.

Synaptic transmission rule: The most importantdistinction is the first one. A finite summation process istolerant to random statistical distortion, but is not tolerantto selective and deliberate loss of inputs. In contrast amean process is robust in both the senses. This is the keydistinction that allows a holographic cell, and thus anetwork of holographic cells, to conduct RCA search.

Mapping ability: The second and third distinctionsare related and together these determine the mappingability. In any associative memory the non-linearitydecides the nature of the discriminating hyperplane thatdistinguishes classes. For a holographic cell the trigo-nometric transformation pairs serve as the implicit non-linearity. The only fundamental difference is that thenon-linearity is local (like the existence of individualthresholds for each element) here. In contrast conven-tional neurons use global non-linearity which is applied

after the weighted sum. Such localization of non-linearityis essential for attaining robustness against missing ele-ments.

Hyperspherical representation: The fundamentaldistinction of holographic cell can be visualized from arepresentational perspective also. One of the basic lim-itation of the conventional network is that there is norepresentation of ’dont-care’. An element labeled as’dont-care’ should be represented in such a way (state)that all the valid enumeration values of its measurements(states) should be equipotential. On a one dimensional(linear) space, it is not possible to obtain a point which isequidistant from all possible enumerations of an analogmeasurement. Any forced enumeration of ’dont-care’ ona real line will always induce undue bias towards two ofthe enumerations than all others. An obvious solution tothis representation problem is to place the enumerationson a plane. MCN representation generalizes the abovesolution a step further and puts the enumerations on thesurface of a hypersphere (Fig-3). The center enumeratesan unbiased ’dont-care’.

The ability of this computational to perform the basicassociative retrieval and also to satisfy the behavioralexpectations of an attentive memory (outlined in section2.1) has been formally shown in [10]. This paper nowpresents the performance analysis of it.

4. ANALYSIS OF PERFORMANCE

In this section the capacity and accuracy of thismemory will now be measured.

Retrieval: The retrieved association can be decom-posed into two parts; principal component and crosstalkcomponent. This can be done by combining equations (6),(7) and (8).

Where is considered the candidate match (or targetpattern). Both the principal and crosstalk components arederived below.

Principal Component: Individual elements of

retrieved pattern are retrieved in identical manner inde-

pendent of each other. Let us consider the retrieval of the

uth component of the response It is also assumed that all

the encoded stimulus patterns have .

z i = ∑j

n

w ijs j…. (9)

w ij = || wij|| e−iωij

φi = cos−1

1c

∑j

n

|| wij|| cos(θj − ωij)

where, c = ∑j

n

|| wij|| ….(10)

φi = f(yi), yi = ∑j

n

wijθj + bi ….(11)

[R e] =1c

⋅ [S e] [S t]T

[R t] +1c

⋅ ∑µ ≠ t

P

[S e] [Sµ]T

[Rµ]

= [Rprincipale ] + [Rcrosstalk

e ] ….(12)

S t

λ = 1

IEEE Transactions on Neural Networks 8

Page 10: Accepted for publication in the IEEE Transactions on ...medianet.kent.edu/publications/JTNN96DL-theory-KY.pdf1967), self-organization (Kohonen 1987, Oja 1982), autonomous unsupervised

If the query stimulus and the target stimulus corre-sponds closely, then for every j and k phase terms

. Thus, all the exponent terms become unitywith no phase disturbance. Which, reduces to,

The phase of the retrieved response corresponds to theretrieved information, and is equivalent to the phase ofthe encoded response:

.

Crosstalk Component: Similarly the crosstalkcomponent is given by:

Saturation Ratio: The saturation ratio is defined asthe ratio of the signal-to-noise magnitude7:

Fig-4 Angular Span of Elements

Let us consider that is the hyperplanespanned by the ith elements of the ath and bth patterns(Fig-4). The orientation angle of element in this plane

will be denoted by . The difference betweenthe orientation angles signifies the direct angular spanbetween the elements and . Let us also define,

It denotes the difference between the angular spansbetween the kth and lth elements of the query(eth) and th

stimulus patterns. It can be shown through somestraightforward trigonometric manipulation that:

Assuming, independent identical and symmetricaldistribution of -suit ( ), over all the element space ofall the enfolded patterns:

or for sufficiently large pn:

Thus,

ru (principal)e =

1c

[S e] [S t]Trj

t

=1c

λ1e

∑j

d − 1ijθj, 1

e , λ2e

∑j

d − 1ijθj, 2

e , ….λne

∑j

d − 1ijθj, n

e

1.e

∑j

d − 1i jθj, n

t

1.e

∑j

d − 1i jθj, n

t

.

.

.

.

1.e

∑j

d − 1i jθj, n

t

rjt

=1c

∑k

n

λke

∑j

d − 1i j

θj, k

e − θj, kt

rj

t …(13)

Sia

Sib

P(a × b , i)

sia

ψia |P(a × b , i)

θj ,kt → θj ,k

e

sia si

b

φk − le − µ = [ψk

e |P(e × µ,k) −ψkµ |P(e × µ,k)] − [ψl

e |P(e × µ, l) −ψlµ |P(e × µ, l)]

ru (principal)e ≅

1c

∑k

n

λkrut ….(14)

µ

SR =√∑k

n

(λk)2 + ∑k

n

∑l ≠ k

n

λkλl cos φk − le − t

(p − 1) ∑k

n

(λk)2 + ∑µ ≠ t

p

∑k

n

∑l ≠ k

n

λkλl cos φk − le − µargc(ru (principal)

e ) ≅ argc(rut) ….(15)

θjµα

E[cos φk − le − µ] = 0

ru (crosstalk)e =

1c

. ∑µ ≠ t

p

[S e] [Sµ]T

[Rµ]

=1c

∑µ ≠ t

p

∑k

n

λke

∑j

d − 1i j

θj, k

e − θj, kµ

ru

µ …(16) ∑

µ ≠ t

p

∑k

n

∑l ≠ k

n

λkλl cos φk − le − µ

→ 0

SR =√1(p − 1)

.

1 +∑k

n

∑l ≠ k

n

λkλl cos φk − le − t

∑k

n

(λk)2

SR =| ru (principal)

e || ru (crosstalk)

e |

7 Note, that saturation ratio is not same as the signal-to-noise ratio (SNR). SNR is the ratio of the signal-to-noisemeasurements.

IEEE Transactions on Neural Networks 9

Page 11: Accepted for publication in the IEEE Transactions on ...medianet.kent.edu/publications/JTNN96DL-theory-KY.pdf1967), self-organization (Kohonen 1987, Oja 1982), autonomous unsupervised

Let us define a distance measure between two patternsd such that, -suit elements of the stimulus and arebounded by the distance d over the entire set, such that

If thedistance between thecandidate and query is large( ), then:

.

On the other hand, for close match, ( ):

Where, w is attention strength.

Definition 6: Attention Strength w refers to relative

strength of the attention distribution over the element

space, and is defined by:

The attention strength w intuitively refers to the’porosity’ of the window frame. It varies from 0 to 1 anddepends on the distribution of in the query field. Fortype-U search all w=1. Thus, when for all duringencoding:

The above result can be summarized as:

Result 1: For the attentive memory specified with

equations (6), (7), and (8) with n stimulus elements and

p stored patterns, and an unequal distribution of attention

specified by the vector , the saturation

is given by:

When (i) , and (ii) the elements are symmetri-cally distributed in phase space. Here w refers to the’porosity’ of the attention distribution.

Accuracy of Retrieval: Now the accuracy of retrievalwill be derived. The resultant response is given by thesum of principal and crosstalk components. The case isinvestigated assuming perfect query meaningresembles closely to one of the stored pattern. Fromequation (12) it can be seen that the capacity is limited bythe accumulation of crosstalks from increasing numberof patterns. Let the crosstalk component be given by

, and the principal component be given by .Here the angles correspond to the direct angular span ofthe components in hyperplane.Then the error in phase (which represents the measure-ment) of the resultant component is given by:

Fig-5 illustrated the addition in hyperspherical space.The phase deviation is maximum when

. Thus, for saturation given byequation (18), the maximum phase error is:

When (i) , and (ii) the input elements are sym-metrically distributed in phase space.

Result 2: For a MHAC specified with equations (6),

(7), and (8) with n stimulus elements and p stored patterns

the maximum distortion due to crosstalk is given by

equation (20), when (i) , and (ii) the input elements

are symmetrically distributed in phase space.

The above analysis shows that the focus can beeffectively (almost linearly) compensated with higher nor lower p. This result is very significant. Because evenfor a fixed size problem, it is possible to design a networkwith exponentially higher effective stimulus length (n) byvarious techniques (such as higher order encoding). Theabove analysis provides us the clue to select a suitable nfor a particular application.

SR ≈ √n .w(p − 1)

….(18)S e S tα

| ψje − ψj

t |≤ d , for all j which implies, 0 ≤| φk − le − t |≤ 2d

p 1

d (e , t) 0

SR ≈ √1(p − 1)

S e

d (e , t) → 0

SR =√1(p − 1)

.

1 +∑k

n

∑l ≠ k

n

λkλl

∑k

n

(λk)2

≈ √n(p − 1).w

rNeiθN rSe

iθS

P(rprincipal × rcrosstalk, u)

φe = tan−1

rN sin(θN − θS)rS + rN cos(θN − θS)

….(19)

Φe |max

= (θN − θS) − 90°

w =

∑k

n

(λk)

2

n . ∑k

n

(λk)2

=[E{λ}]2

E{λ2}….(17) | Φerror |max= sin−1

√pwn

….(20)

p 1

λλj → 1

SR = √n(p − 1)

p 1

Λe = [λ1, λ2, …, λn]

IEEE Transactions on Neural Networks 10

Page 12: Accepted for publication in the IEEE Transactions on ...medianet.kent.edu/publications/JTNN96DL-theory-KY.pdf1967), self-organization (Kohonen 1987, Oja 1982), autonomous unsupervised

Notably, the performance of this memory isdependenton the symmetry of the element distribution in the phasespace. The performance of conventional neural networksis tied with the uniformity of distribution. Highly corre-lated elements in patterns destroys uniformity and con-sequently the performance of many ANNs. But uniformdistribution is a special case of symmetrical distributionand ismore restrictive. It ispossible to obtain symmetricaldistribution without uniformity. This is because unlikereal interval (which enumerates elements of conventionalANNs) phase space is harmonic. As a result MHACperformance is less restrictively tied with correlated dataset.

Fig-5 Geometry of Phase Error

Error due to imperfect query pattern: The twosources of error in the final response are (i) cross-talk dueto saturation and (ii)principal component due to deviationof query pattern from the target pattern. Previous analysisshowed the amount of error due to cross-talk. The errordue to pattern deviation is the sum of the deviations ofindividual pattern elements. Thus it linearly moves awayfrom the target patterns with mean of the shift in query

, when when the error is small. It canbe geometrically shown that the magnitude of the error

due to pattern deviation grows in the order of .

5. EXPERIMENTSThe analysis of last section shows that the perform-

ance of this new memory is dependent on (i) strength offocus, (ii) length of stimulus patterns, (iii) number ofencodedpatterns and (iv) distribution of data.This sectionpresents a set of experiments to empirically validate andinvestigate the effect of each these factors. Below, firstthe parameters those have been used in these experimentsare explained.

5.1 ParametersDefinition 7: Accuracy of Retrieval (SNR) is mea-

sured as the peak signal to noise ratio in the measurement

component of information over all the elements.

The peak signal is given by the dynamic phase range. Average SNR is computed by averaging over all the

pattern associations enfolded in the memory.

Definition 8: Focus Strength (f) is defined as ratio

of the input significance strength of a query pattern to

the significance strength of encoded pattern.

A uni-magnitude encoding of pattern elements hasbeen assumed. Its value varies from 0 to 1. In the plotsQPD=1-f has been used.

Definition 9: Load Factor (L) is defined as the ratio

of the total number of elements (n) in the patterns to the

number of stimulus response associations (p) encoded.

As evident, the length of stimulus (n) is alreadyincorporated in the load factor.

Definition 10: Asymmetry (k) of a pattern refers to

the circular distribution of the pattern elements around

the center of the representation hypersphere.

It is defined as above and its value varies from 0 to1. In all these experiments, pattern elements have beengenerated randomly with clipped Gaussian8 distributionto match natural distributions (such as image intensity).However, the standard deviation has been varied togenerate data with different assymmetry characteristic.High standard deviation (SD) corresponds to lowassymmetry and vice versa.

SNR = 20 log2π

msemse = √1

m∑i

m

[φiµ − φi

T(µ)]2

S e

principal

cross-talkr

r

r

phase-error

f =∑i

n

λi

n

L =pn

θit − θi

e = εiΦε = n .εi

√2 sin

ε2

k =∑i

n

λi exp∑

j

d

i jθij

∑i

n

λi

8 Because of the circular nature of phase space, only those random generations have been used which falls between 0to 2 Π.

IEEE Transactions on Neural Networks 11

Page 13: Accepted for publication in the IEEE Transactions on ...medianet.kent.edu/publications/JTNN96DL-theory-KY.pdf1967), self-organization (Kohonen 1987, Oja 1982), autonomous unsupervised

Besides investigating the general relationship amongthese critical parameters, these experiments simulta-neously examine the specific ranges of these parameterswithin which an effective and cost efficient attentivememory can be constructed.

5.2 ORS ExperimentThe parameters of the attentive memory are inde-

pendent, monotonic and together span a parameter space.The objective of the experimentation is to determine thesub-space (and their boundaries) of this parameter spacewithin which it is possible to guarantee a target per-formance. It is called operational range space (ORS).

The availability of an ORS is advantageous from theengineering point of view. Given an ORS, when a newapplication is taken under consideration, all that isrequired, is to measure application specific parametersand to verify whether it falls inside or outside the ORS.If it is within, then the pre-analyzed results available fromORS experimentation can be used to predict approximateperformance. Also the necessary configuration of thesystem for that application can be estimated. On the otherhand, if it is outside, ORS experimentation results can stillbe used to identify the exact intervention that would bringthe application within ORS.

5.3 Analysis of ExperimentsFocus characteristic: The retrieval performance

with the variation of focus strength is shown first. A setof holographs has been generated each with variednumbers of encoded patterns. After the training, by usinga randompart of each originallystored pattern as the querypattern, recalls have been performed. The focus strengthhas been controlled by varying the size of this part. Forquery pattern elements not selected in the focus sethas been used. Fig-6 shows the typical average signal-to-noise ratio (left y-scale) and percentage of dynamic

error (right y-scale) with the smooth variation of focusstrength of the query pattern. The three curves in thisgraph show the focus characteristics for three differentload factors L= .02, .04 and .08. For all these cases thepatterns have length n=1000 and asymmetry SD=1.0(k=.6). Fig-7 plots theperformance for 3different elementdistributions with respectively SD=.8,1.2, and 3 for L=.02while other parameters remain same.

As evident from both of these plots, a typical focuscharacteristic curve is monotonic and resembles a fatsigmoid. These curves generally demonstrate three dis-tinct zones, (a) high-performance zone, (b) linear-zone,and (c) cut-off zone.

The high performance zone corresponds to RCAtype-U search performance. This zone characterizesregular AAM like high focus and is featured with highaccuracy. As evident by the accuracy levels of this zone,an attentive memory, even when it acts as a regularmemory far exceeds the retrieval accuracy of most otheranalog AAMs. This zone demonstrates accuracy over 30dB (which in other words means less than 2-3% phasevalue error).

The most significant is the linear zone. In this zone,the accuracy gracefully decays with the focus strength.Analytically the characteristics of this zone correspond toequation (20). As can be seen, focus strength can bereduced almost as low as 0.1 till the accuracy falls below20dB. In marked contrast, a regular AAM shows ava-lanche degeneration of performance when the focusstrength approaches just 0.6-0.5 [27]. The cut-off zonefor this attentive memory begins around 0.1 while that ofconventional AAMs begins at 0.5. As can be observedin these plots, the typical ORS boundaries are (a) thehigh-performance zone extends from f=1.0-0.9, (b)linear-zone extends from f=0.9-0.1.

λi ≈ 0

IEEE Transactions on Neural Networks 12

Page 14: Accepted for publication in the IEEE Transactions on ...medianet.kent.edu/publications/JTNN96DL-theory-KY.pdf1967), self-organization (Kohonen 1987, Oja 1982), autonomous unsupervised

Fig-6

Fig-7

0 20 40 60 80 100-20

0

20

40

60

80

100

0

20

40

60

80

100

FOCUS CHARACTERISTICSn=1000,p/n=.02,04,08,I=5, DV=3.0

H2420B:G32-X03.a

QUERY PATTERN DISTORTION

Ave

rage

SN

R (d

b)

Ave

rage

Rec

all E

rror (

%)

SNR,L=.02 SNR, L=.04 SNR, L=.08

AER, L= .02 AER, L=.04 AER, L=.08

LINEAR-PERF. ZONE

HIGH-GAIN ZONE CUT-OFF ZONE

0 20 40 60 80 100-20

0

20

40

60

80

100

0

20

40

60

80

100

FOCUS CHARACTERISTICSn=1000,p/n=.02,I=5, DV=.8,1.2,3

H2419A:G34-W03a

QUERY PATTERN DISTORTION

Ave

rage

SN

R (d

b)

Ave

rage

Rec

all E

rror (

%)

SNR, SD=.8 SNR, SD=1.2 SNR, SD= 3

AER, SD=.8 AER, SD=1.2 AER, SD=3.0

LINEAR-PERF. ZONE

HIGH-GAIN ZONE CUT-OFF ZONE

IEEE Transactions on Neural Networks 13

Page 15: Accepted for publication in the IEEE Transactions on ...medianet.kent.edu/publications/JTNN96DL-theory-KY.pdf1967), self-organization (Kohonen 1987, Oja 1982), autonomous unsupervised

Fig-8

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11-20

-10

0

10

20

30

40

50

0

10

20

30

40

50

LOADING CHARACTERISTICSn=1000,I=30, SD=1,f=.75,.5,.25

H2420D(padma):G23-Q06

LOADING (L=p/n)

Ave

rage

SN

R (d

b)

Ave

rage

Rec

all E

rror (

%)

f=.75 f=.50 f=.25 f=.75 f=.50 f=.25

Loading characteristic: Next simulation shows theeffect of loading on the performance of the attentivememory. To determine the ORS boundaries of the loadfactor range, first a pool of clipped Gaussian patterns hasbeen generated (all with a fixed length). Then differentholographs have been trained each time taking a differentnumber of patterns from this pool.

Fig-8 shows a usual loading performance. It plots theSNR (y-axis) against various load factors (x-axis) forthree RCA type-A cases with focus strengths f=0.75,f=0.50, and f=0.25. The pattern sets are generated withstandard deviation SD=1.0. The average asymmetry (k)of these patterns is found to be k=0.3.

As shown in this plot, a typical loading characteristiccurve shows monotonically decreasing performance with

increased load factor. Quantitatively, for f=0.25, the RCAtype-A retrieval accuracy drops to 20 dB, while the loadfactor reaches 0.07. Typically a load factor of 0.03 to 0.10can be reached maintaining 20 dB performance withf=0.3-0.1. This range defines the load factor boundary ofORS. This experiment shows that an enormous numberof pattern associations can be stored and retrieved from asingle holographic memory. For example, a load factorof 0.02 means that about 5,000 images of size 512x512can be enfolded into a single holographic attentivememoryand canbe searched with RCA type-A capability.This particular search (which is equivalent to searchinginto 1.25 GB of raw space) can be done only at the costof only one complex inner product9. Table-4 lists fewother loading scenarios.

9 The size of the matrix is nxm, where m is the response label size and typically is , where as a proceduralbest match search is .

O(log p)nxp

IEEE Transactions on Neural Networks 14

Page 16: Accepted for publication in the IEEE Transactions on ...medianet.kent.edu/publications/JTNN96DL-theory-KY.pdf1967), self-organization (Kohonen 1987, Oja 1982), autonomous unsupervised

Fig-9

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.110

15

20

25

30

35

40

45

50

DATA DISTRIBUTION CHARACTERISTICSn=1000,p/n=.02,I=5, f=100-25%

H2420C(symm):G43-X01

ASSYMETRY (k)

Ave

rage

SN

R (d

b)

f=.75 f=.50 f=.25

Asymmetry characteristics: The ORS boundariesof asymmetry parameter can be observed through theprojection of range-space by continuous variation of k.To perform this experiment, several sets of patterns havebeen generated with varying standard deviations. Theseare then encoded into different holographs. Variation inthe standard deviation (of clipped Gaussian distribution)generates data sets with various asymmetries. The nar-rower the deviation, the higher the asymmetry.

Fig-9 shows a typical data distribution characteristic.It plots the SNR (y-axis) against the primary parameterasymmetry (x-axis). Three RCA type-A test results withsecondary parameter focus strength f=0.75, f=0.50, andf=0.25 have been shown by the three curves. For theseexperiments, each of these holographs has been loadedwith a load factor =0.02, and has been trained with 5iterations.

Typically, as the asymmetry increases, the perform-ance of MHAC decreases. As shown in Fig-9, MHAC cantolerate up to 0.6 asymmetry of the data distribution andcan still maintain 20 dB performance within the opera-

tional range-space.

The result of this experiment is particularly importantfor the design of the mapping transform. The actual natureof data distribution depends on the application and in mostcases isbeyond thecontrolof thesystem designer. Table-3shows some examples of asymmetry values for fewtypical images. In the extreme cases of unusually illskewed data set, the above result provides importantguideline to the designer. Appropriate transformscan be designed by which the asymmetry level of theprocessed data can be hashed within the acceptable range.

Effect of stimulus length: As evident from thedefinition, the stimulus length is already a part of loadfactor. Therefore, the principal effect of n can be observedreadily in Fig-8. However, an important remainingquestion is whether the performance of holographicattentive memory is sustainable for larger scales of thismemory with larger values of p and n even when the ratiois fixed. This experiment is particularly designed toinvestigate such scalability of attentive memory.

m+(.)

IEEE Transactions on Neural Networks 15

Page 17: Accepted for publication in the IEEE Transactions on ...medianet.kent.edu/publications/JTNN96DL-theory-KY.pdf1967), self-organization (Kohonen 1987, Oja 1982), autonomous unsupervised

Fig-10

Fig-11

0 20 40 60 80 10010

20

30

40

50

60

70

80

90

100

FOCUS CHARACTERISTICSp/n=.02,I=5,SD=3

H2423A:G30-X01M

QUERY PATTERN DISTORTION

Ave

rage

SN

R (d

b)

n=400 n=800 n=16,000 n=32,000 n=64,000

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.10

50

100

150

200

LOADING CHARACTERISTICSI=10,SD=3,f=1.0

H2420D(jamuna):G20-Y01

LOADING (L=p/n)

Ave

rage

SN

R (d

b)

n=100 n=200 n=400 n=800

IEEE Transactions on Neural Networks 16

Page 18: Accepted for publication in the IEEE Transactions on ...medianet.kent.edu/publications/JTNN96DL-theory-KY.pdf1967), self-organization (Kohonen 1987, Oja 1982), autonomous unsupervised

Fig-10 plots the SNR against focus strength f forexponentially varying n=400, 800, 1600, 3200 and 6400for a fixed p/n ratio. As evident, although the problemscale varies exponentially, these curves overlap on eachother demonstrating the performance invariance of focuscharacteristics with respect to the scale of n. SimilarlyFig-11 shows the scale invariancy of loading character-istics. The result has been verified by repeated simulationfor all the other plots for other scales.

This result also conforms to the analytical derivations(equation (18), and (20) which show that most of theperformance characteristics are related to the load ratiop/n, rather than n. Both analytical as well as experimentalresults indicate the enormous algorithmic scalability ofthe attentive memory with sustainable performance.

Dimensionality of representation: The objective ofthis final set of simulation is to examine the effect of

representation dimension (the dimension of hyperspher-ical space in which the element vectors are oriented) onthe performance of the attentive memory.

For this experiment the same sets of randomly gen-erated vectors have been encoded with different dimen-sionality of representation. To generate asymmetryvariations, the orientation phases of these vectors aremapped uniformly within range duringmapping. Five specific distributions with

have been considered. Nar-rower R corresponds to narrower distribution range andthus higher asymmetry. The experiment has beenperformed with P (=16) vectors each with D dimensionalS (=32) elements. Each of the phase components has beengenerated randomly with uniform step distribution withinthe range R. During the recall process the principalcomponent and the cross component of the separatelymeasured. The experiment is repeated for dimensionsD=2 to 10.

+R ≥ θ > −R

R = 450, 600, 900, 1200, 1800,

Fig-12 SNR and Dimension

DIMENSION

Aver

age S

NR (d

b)

Stimulus Length=32, Patterns=16

2 3 4 5 6 7 8 9 10

60

50

40

30

20

10

0

|R|<45 |R|<60 |R|<90 |R|<120 |R|<180

MULTI-DIMENSIONAL ERROR GROWTH

IEEE Transactions on Neural Networks 17

Page 19: Accepted for publication in the IEEE Transactions on ...medianet.kent.edu/publications/JTNN96DL-theory-KY.pdf1967), self-organization (Kohonen 1987, Oja 1982), autonomous unsupervised

Fig-13 Crosstalk and Dimension

Cros

s Talk

(nor

maliz

ed)

Stimulus Length=32, Patterns=16

2 3 4 5 6 7 8 9 10

0.4

0.35

0.3

0.25

0.2

0.15

0.1

0.05

0

DIMENSION

|R|<45 |R|<60 |R|<90 |R|<120 |R|<180

MULTI-DIMENSIONAL ERROR GROWTH

Fig-12 plots the signal to noise ratio against thedimensionality. Fig-13 plots the crosstalk componentagainst dimensionality. The results of this experimentclearly show that SNR improves (Fig-12) and crosstalk(Fig-13) decreases as one shifts to higher dimensionalrepresentation. It is also evident that the improvement ismore drastic if the phase distribution window is narrow.

This experiment helps in appreciating the contribu-tion of multidimensional representation over associativecomputing. A 2D representation space helps inincorporating the novel notion of attention into associa-tive memory. Thus it makes a qualitative difference overthe capabilities of representationally scalar associativecomputing. Higher dimensions can further increase it’sperformance10, and thus makes additional quantitativeimprovement.

5.4 Summary of ORS BoundariesThe quantitative results of ORS experiments are

summarized in Table-2 suggesting an operating rangespace (ORS) for the 2D attentive memory system. Table-2in particular guarantees an accuracy in the range of 20 dB.It shows the asymmetry, load factor, focus and iterationranges required to achieve this target performance. Allthese parameters are monotonic, hence the space spannedby these boundary values and the co-ordinate planesrepresents the ORS.

Parameter Unit Operational range

Accuracy SNR(dB) >20

Asymmetry k <.6

Load factor L <.08

Focus f >0.1

Table-2 Operational Range Space

In Table-2, accuracy is the target parameter. Asym-metry is a data dependent parameter and is a semi-controllableconstraint in a given application. It ispossibleto improve symmetry through various smoothingtechniques. Table-3 shows some asymmetry measure offew well-known11 example images.

image k (1st order) dimension

lake .23 512x512

tree .19 256x256

lena .38 256x256

house .27 256x256

pepper .17 512x512

Table-3 Typical Asymmetry

10 This characteristic has been analytically validated in details in [10].

11 Can be downloaded from author’s home page.

IEEE Transactions on Neural Networks 18

Page 20: Accepted for publication in the IEEE Transactions on ...medianet.kent.edu/publications/JTNN96DL-theory-KY.pdf1967), self-organization (Kohonen 1987, Oja 1982), autonomous unsupervised

Loading and training are two controllable designparameters. Loading is closely tied to the space efficiencyof any associative memory. The dimension of a holographis determined by the length of the stimulus (n) andresponse patterns (m). Load factor provides an estimatehow many such patterns can be enfolded on a singleholographic memory. Table-4 shows typical estimates onthe number of patterns that can be stored (and queried)for few image sizes. However, for patterns with limitedsize, load factor is not necessarily a hard limitation. Thenumber of stored patterns p for relatively small patternscan be increased by higher order encoding.

n L p (1st order)

160 120 .04 768

256 256 .02 1310

512 512 .02 5120

1024 1024 .01 10240

1024 1024 .02 20480

Table-4 Typical Memory Loading

The above operational range-space provides a quickmeans for predicting the performance and estimatingdesign parameters whenever a new application is con-sidered. For example, if an associative memory withCT-scan images is to be constructed, all that is required

is to estimate the asymmetry characteristics of the images.If the asymmetry is within the range space (k<0.6), thenit is possible to predict the required dimension and otherparameters for the corresponding attentive memory sys-tem. On the other hand, if k>0.6, even then it is possibleto estimate how much smoothing is needed to obtain thetarget performance.

5.5 An Associative Search ExampleThe attention ability of this model is now demon-

strated through an image pattern retrieval example. AnMHAC memory has been created with 64 CT scan andMRI images each of size 256x256 pixels. Fig-15 showsthe full frame retrieval accuracy of this memory for eachof the 64 stored images. Fig-14(a) shows a typical queryimage with two windows, each focusing on a cognitivelysignificant object in it (Vertebrea and Kindney). Table-5lists the visual specifications of these objects in terms oftheir four corners and their size relative to the frame.Fig-14(b) shows the corresponding matching imageswhich has been respectively retrieved by the memory asthe associative match. As evident, based on the focusspecification, each time the memory correctly retrievedthe appropriate target image. Although none of thesestored pictures have global statistical similarity with thequery image, but both the matches are correct on the basisof localized similarity. Table-6 shows the performance.As evident in these cases, often the focus strength ofeffective cue lies in the range of 5-20% of the entire framesize and MHAC can retrieve them with 20-40 dB accu-racy.

×

×

×

×

×

IEEE Transactions on Neural Networks 19

Page 21: Accepted for publication in the IEEE Transactions on ...medianet.kent.edu/publications/JTNN96DL-theory-KY.pdf1967), self-organization (Kohonen 1987, Oja 1982), autonomous unsupervised

Fig-15 Decoding Accuracy

MASK# Object xmax xmin ymax ymin rho

4 Kidney 230 -135 165 -067 .145

6 Vertebrae 119 -045 145 -031 .131

Table-5 Specifications of the Windows of Focus

MASK# Object f Match# SNR

(db)

4 Kidney .145 4.4 (A26) 31.22

6 Vertebrae .131 6.1 (A25) 27.55

Table-6 Results of Retrieval

0

50

100

150

200

250

SNR CHARTMEDIA ARCHIVE (MP= -.31 rad, AF=.05)

TRAIN:M21-K01aFRAME INDEX

SNR

(db

)

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63

IEEE Transactions on Neural Networks 20

Page 22: Accepted for publication in the IEEE Transactions on ...medianet.kent.edu/publications/JTNN96DL-theory-KY.pdf1967), self-organization (Kohonen 1987, Oja 1982), autonomous unsupervised

Most conventional AAMs are currently unable tosupport the demonstrated retrieval for two reasons. First,the cue sizes are far below the 50% statistical dominancebarrier of current AAMs. Secondly, and more profoundly,they would not be able to converge to diverse matchesfromsame input pattern since they do not supportdynamicsearch localization.

6. CONCLUSIONSThe paper investigates the performance characteris-

tics of a new associative memory called attentive memoryboth analytically and through computer simulations. Theresults strongly suggest both quantitative and qualitativeadvantages of this new memory over existing parallel anddistributed models of associative computing. In thisconcluding section the principal results and their impli-cations will be briefly summarized.

Performance: It has been shown quantitatively thatas low as 5-10% cue can be effectively used. This is afundamental improvement over the capabilities of exis-ting AAMs. Existing AAMs cease to retrieve when thevalid part of the query pattern falls below the statisticallimit of 70-50% the original pattern size [27,8]. The ORSranges for focus (f=1-.1) and loading characteristics(L=.01-.04) suggests the designability of real (associativememory based) applications with this model. It has alsobeen shown that its performance as a regular type-Uexceeds that of most of the existing type-U equivalentmodels [26,10].

Scalability: Any pseudo optimization algorithm,besides the sustenance of speedup (architectural scal-ability), also requires sustenance of the quality of thesolution for scalability (computational scalability).Analytical and empirical evidences obtained in this workboth suggest that the performance of the network issustainable for larger scale of the problem size (charac-terized by n and p). It is a well-observed phenomenon thatthe scalability of even the most successful ANN models(such as Backpropagation, Counterpropagation net-works) are rather limited. Not only the amount of com-putation increases, but also the convergence speed,accuracy of any conventional ANN degrades steeplywhen problem size increases. In addition to the demon-strated computational scalability, the highly structuredand heavy grain complex valued matrix operations of thismemory makes it suitable for parallelization12 and sug-gests simultaneous architectural scalability.

Implementation: As evident, the computationalmodel of this memory is highly structured and repetative.Such characteristics make this entire model implement-able with easily cascadable and reusable VLSI blocks. Weare currently investigating an integrated architectureusing hierarchical shared memory with a set of concurrentencoding/decoding processors. As already explained, atthemacro level these same properties favor highly paralleland distributed implementation on conventional MIMDparallel machines. This memory also bears excellentpotential for optical realization. The hypersphericalcomputations maps naturally on optical computations.Also, in the optical realization, patterns can be recalledby non-mechanical means.Which signifies that the accesstime can be in the order of several microseconds (whichis 100-1000 times faster than current Compact Disktechnology). Recently optical holographic echnology hasmade phenomenal advancement as storage medium [23].As evident, the results obtained in this work directlybroaden the computational potential of this promising(and ripe) technology by demonstrating their moreadvanced applicability in dynamic attention based asso-ciative recollection.

Potential Applications: Qualitatively this newmemory provides the novel RCA type-A and type-Bcapability within the framework of associative comput-ing. It can potentially facilitate the solution of a wholenew class of unresolved problems requiring bothadaptability of modelacquisitionand dynamic associativerecollection13. Content based retrieval in image archive,search in massive digital libraries, target recognition,pattern analysis in multidimensional spectral data,associative inference engine, real time speech synthesis[12,3] are just few of the daunting problems which fit inthis class and can directly benefit from this new memorymodel with attention. MHAC has already been success-fully used to develop the first associative memory basedapproach for a content-based image archival and retrievalsystem. This approach can overcome subjective inco-herence of traditional symbolic model mediatedapproaches [12,13,14].

The success of any computational model as aknowledge hub will require much more flexible andsophisticated retrieval capabilities than those which areoffered by today’s neural computing, in addition tolearning and knowledge acquisition capabilities. Current

12 Current generation parallel computers are characterized by their regular and structured architecture and relativelyhigh communication cost. As a result they favors computations which are regular and heavy grain.

13 If we analyze the successful applications of existing AAMs that evolved over last two decades, it will be evidentthat most of these use AAMs as adaptive filters and classifiers. Cosequently, in current literature neural networks areoften referred by the term ’adaptive filter’ almost as a synonym [1]. However, hardly any successful application existswhich truly utilizes the associative memory property of AAMs. Such state of the art indeed reflects, on one hand, thesophistication of the learning ability, on the other hand, the constraints of the associative recollection ability of currentAAMs.

IEEE Transactions on Neural Networks 21

Page 23: Accepted for publication in the IEEE Transactions on ...medianet.kent.edu/publications/JTNN96DL-theory-KY.pdf1967), self-organization (Kohonen 1987, Oja 1982), autonomous unsupervised

models excel mostly in the later. The demonstratedattentive memory takes current associative computing astep closer to that goal.

The research has been supported by an East West

Center Fellowship. Also a part of this research has beenfunded by ACTS and Supercomputing in Remote, Co-operative Medical Triage Support and Radiation Treat-ment Planning project of ARPA under research grantDABT 63-93-C-0056.

IEEE Transactions on Neural Networks 22

Page 24: Accepted for publication in the IEEE Transactions on ...medianet.kent.edu/publications/JTNN96DL-theory-KY.pdf1967), self-organization (Kohonen 1987, Oja 1982), autonomous unsupervised

7. REFERENCES

[1] Carpenter, G. A., "Neural Network Models for PatternRecognition and Associative Memory", Neural Networks,v.2, 1989.

[2] Carpenter G. A., S. Grossberg, N. Markuzon, J. H.Reynolds, & D. B. Rosen, "Attentive Supervised learningand Recognition by Adaptive Resonance Systems", Neu-ral Networks for Vision and Image Processing, Ed. G. A.Carpenter, S. Grossberg, MIT Press, 1992, pp364-383.

[3] Chang, S.K., Arding Hsu, "Image Information Systems:Where Do We Go From Here?", IEEE Trans. on Knowl-edgeand Data Engineering, v.4, n.5, October1992,pp431.

[4] Gabor, D., "A New Microscopic Principle", Nature, v.161,1948, pp777-778.

[5] Gabor, D., "Associative Holographic Memories", IBMJournal of Research and Development, I3, 1969, pp156-159.

[6] Grossberg, S., "Nonlinear Difference-Differential Equa-tions in Prediction and Learning Theory", Proc. ofNational Academy of Science, v.58, n.4, October 1967,pp1329-1334.

[7] Grossberg, S., "On the Development of Feature Detectorsin the Visual Cortex with Applications to Learning andReaction-Diffusion Systems", Biological Cybernetics,v.21, n.3, 1976, pp145-159.

[8] Hopfield, J. J., "Neural Networks and Systems withEmergent Collective Computational Abilities", Proc. ofNational Academy of Science, USA, v.79, April 1982,pp2554-2558.

[9] Jacobson, M,Foundations of Neuroscience,Plenum Press,New York, 1993, pp173.

[10] Khan, Javed. I., "Attention Modulated Associative Com-puting and Content Associative Search in Images", Ph.D.Dissertation, Department of Electrical Enginnering,University of Hawaii, July, 1995.

[11] Khan Javed. I., and D. Y. Y. Yun, "Chaotic Vectors and aProposal for Multidimensional Complex AssociativeNetwork", Proceedings of SPIE/IS&T Symposium onElectronic Imaging Science & Technology ’94, Confer-ence 2185, San Jose, CA, February 1994, pp95-106.

[12] Khan Javed. I.,& D. Yun, "Searching into AmorphousInformation Archive", International Conference on Neu-ral Information Processing, ICONIP’94, Seoul, October,1994, pp739-749.

[13] Khan J. I.,& D. Yun, "An Associative Memory Model forsearching Image Database by Image Snippet", Proceed-ings SPIE Conference on Visual Communication, Vis-Com’94, Chicago, September, 1994, pp591-601.

[14] Khan J. I.,& D. Yun, "Feature Based Visual Query inImage Archive with Holographic Network", Proceedingsof the International Conf. on Robotics, Control and Vision,ICARCV’94, Singapore, November, 1994.

[15] Klopf, A. H., "Drive-Reinforcement Learning: A RealTime Learning Mechanism for Unsupervised Learning",Proc. of 1st IEEE Conf. on Neural Networks, Vol.II, N.J.,1987, pp441-445.

[16] Kohonen, T., Content Addressable Memories, 2nd Ed.,Springler Verlag, Berlin, 1987.

[17] Kohonen, T., Self-Organization and Associative Memory,3rd Ed., Springler Verlag, Berlin, 1989.

[18] Kumar, B. V. K., P. H. Wong, "Optical AssociativeMemories", Artificial Neural Networks and StatisticalPattern Recognition, I. K. Sethi and A.K. Jain (Eds.),Elsevier Science Publisheres, 1991, pp219-241.

[19] Leigth, E. N., and J. Upatnieks, "Photography by Laser",Sceintific Amearican, June 1965.

[20] Masters, T., Signal and Image Processing with NeuralNetworks, John Wiley & Sons, New York, 1994.

[21] McCulloch, W. S., Walter H. Pitts, "A Logical Calculusof the ideas Immanent in Nervous Activity", Bulletin ofMathematical Biophysics v.5, 1943, pp115-133.

[22] Oja, E., "A Simplified Neuron Model as a principalComponent Analyzer", Journal of Mathematical Biology,v.15, 1982, pp267-273.

[23] Psaltis, D, Fai Mok, Holographic Memories, ScientificAmerican, November 1995, pp70-76.

[24] Whitehead, A. N. and B. Russell, Principia Mathematica,2d ed. Cambridge, Cambridge University press, 1927.

[25] Sherrington, C.S., The Integrative Actions of NervousSystem, Yale Univ. Press, New Haven, 1906.

[26] Sutherland, J., "Holographic Models of Memory, learningand Expression", International J. Of Neural Systems, 1(3),1990, pp356-267.

[27] Tai, Heng-Ming, T. L., Jong, "Information Storage inHigh-order Neural Networks with Unequal NeuralActivity", J. of Franklin Institute, v.327, n.1, 1990,pp16-32.

[28] Wenyon, Michael, Understanding Holography, ArcoPublishing Inc., NY 1978.

[29] Widrow, B., M.E. Hoff, "Adaptive Switching Circuits",IRE WESCON Convention Record, part 4,1960, pp96-104.

[30] Willshaw, D., Holography, associative memory andinductivegeneralization, in Parallel Models of AssociativeMemory, G.E. Hinton and J. E. Anderson, Eds, Hillsdale,NJ: Erlbaum, 1985.

IEEE Transactions on Neural Networks 23

Page 25: Accepted for publication in the IEEE Transactions on ...medianet.kent.edu/publications/JTNN96DL-theory-KY.pdf1967), self-organization (Kohonen 1987, Oja 1982), autonomous unsupervised

ABD−RT:A26 256x256 ABD−LT:A25: 256x256

MATCH#4.4 MATCH#6.1

VM−4

Kidney

ABD−Q1 256x256 CTS

VM−4VM−6Vertebrae

Fig−14(b) Retrieved Images from the Attentive Memory

Fig−14(a) Sample Query Image and Focus Objects