Capacity analysis for a two-level decoupled Hamming network for associative memory under a noisy environment

Neural Networks 20 (2007) 598–609www.elsevier.com/locate/neunet

Capacity analysis for a two-level decoupled Hamming network forassociative memory under a noisy environment

Liang Chena,∗, Naoyuki Tokudab,1, Akira Nagaic

a Computer Science Department, University of Northern British Columbia, BC, Canada V2N 4Z9b SunFlare Research and Development Center, Shinjuku Hirose Bldg, Yotsuya 4-7, Shinjuku-ku, Tokyo 160-0004, Japan

c Advanced Media Network Center, Utsunomiya University, Utsunomiya, Tochigi 321-8585, Japan

Received 12 October 2004; received in revised form 8 May 2006; accepted 8 May 2006

Abstract

Our detailed analysis has established that in addition to the advantages of computationally efficiency and easy hardware implementation, thetwo-level decoupled Hamming network possesses a substantially higher capacity over the single-level Hamming associative memory since theeffect caused by Ikeda et al.’s uniform random noise [Ikeda, N., Watta, P., Artiklar, M., & Hassoun, M. (2001). A two-level Hamming network forhigh performance associative memory. Neural Networks, 14(9), 1189–1200] is much smaller than that caused by the practically more prevalentconcentrated noise. We therefore conclude that the two-level decoupled Hamming network with middle-sized windows should be an elegantassociative memory model in all the senses of efficiency, hardware implementation and capacity.c© 2007 Elsevier Ltd. All rights reserved.

Keywords: Capacity analysis; Hamming network; Associative memory; Voting

1. Introduction

An associative memory is a memory that is addressedthrough its contents. The behavior of an associative memory isvery close to our human behavior in remembering something inthe presence of a sensory cue. When an input pattern, calleda memory key, is presented, the associative memory shouldreturn a stored memory pattern coincident with the key. Thecoincidence between a memory key and a stored pattern neednot and should not be perfect. The associate memory shouldbe able to recall a stored pattern that is similar to the memorykey such that noise-polluted inputs can also be recognized. Theassociate memory can find applications in the areas of patternclassification, image and voice recognition, robotics, databases,and so on (Ikeda, Watta, Artiklar, & Hassoun, 2001).

Many neural models have been proposed for associativememories (Chiueh & Goodman, 1991; Chou, 1989; Hassoun,1993; Hopfield, 1982), but it seems that there is no practical

∗ Corresponding author. Tel.: +1 250 9605838; fax: +1 250 9605544.E-mail addresses: [email protected] (L. Chen), tokuda [email protected]

(N. Tokuda), [email protected] (A. Nagai).1 Tel.: +81 3 3355 1383; fax: +81 3 3355 1174.

0893-6080/$ - see front matter c© 2007 Elsevier Ltd. All rights reserved.doi:10.1016/j.neunet.2006.05.045

package available up to now due to many serious design flaws.The Hamming associative memory, which is conceptually thesimplest, seems to be a promising model to meet most of theessential design criteria (Ikeda et al., 2001; Ikeda, Watta, &Hassoun, 1998). It has been shown (Watta, Wang, & Hassoun,1997) that Hamming associative memory theoretically canbe implemented with a collection of AND and OR gates.However, the hardware implementation suffers from practicaldifficulties because of the requirement of an excessive numberof interconnections, and because of computing ineffectivenesssince each memory key has to be compared with all memorypatterns.

Very recently, Ikeda et al. (2001) has presented a two-leveldecoupled Hamming memory, which is generalized from asingle-level Hamming associative memory. In the first level ofa two-level decoupled Hamming memory, the memory key ispartitioned into non-overlapped blocks, each of which performsHamming associative memory operations to determine a closestlocal memory pattern. The second level of a decoupledHamming memory acts as a decision network to return an entirememory pattern based on a voting strategy. It has been shownthat the two-level decoupled Hamming memory retains a highperformance and allows for a simpler hardware implementation

http://www.elsevier.com/locate/neunet

mailto:[email protected]



http://dx.doi.org/10.1016/j.neunet.2006.05.045

L. Chen et al. / Neural Networks 20 (2007) 598–609 599

since each lower-level Hamming memory involves only a blockof inputs (see Ikeda et al. (2001) for detailed discussions ontwo-level Hamming memory). But it has also been shown thata two-level decoupled Hamming memory has a lower capacityfor tolerating uniform random noise in comparison with asingle-level Hamming associative memory (Ikeda et al., 2001).2

This paper follows the basic assumptions of Ikeda et al.(2001) on the uniform distribution of memory patterns andon the uniform distribution of random noise that corruptsa memory pattern into a memory key to be recognized.We further assume the presence of so-called concentratednoise, which corrupts parts of memory keys. While memorypatterns and memory keys can be of one dimension, twodimensions or multi-dimensions, depending on applications;we will concentrate our analysis on two-dimensional patterns(and keys), but it is easy to see that the conclusions are alsovalid for one- or many-dimensional situations as well. We showthat, although a two-level decoupled Hamming memory hasa lower capacity than a one-level Hamming memory in thepresence of uniform random noise, the difference is actuallyminor; and that this slightly lower capacity is more thancompensated by the substantially higher capacity of the two-level decoupled Hamming memory relative to the one-leveldecoupled Hamming memory against concentrated noise whichis practically more prevalent. The capacity of an associativememory is defined as the maximum number of memory patternsto be stored reliably at a fixed noise level (Kawamura & Hirai,1997), where reliable means that the probability of correctretrieval is higher than a certain level. It can be estimated, aswe will do in this paper, by computing the probability thatthe associative memory returns the target memory for a fixednumber of memory patterns at a fixed amount of noise.

2. Hamming associative memory, two-level Hammingmemory and basic assumption on memory patterns

2.1. Hamming associative memory

Throughout this paper, we follow the definition of Ikedaet al. (2001), discussing only the auto-associative memoryfor two-dimensional patterns. In this case, each memorypattern Xi

M1×M2is an integer array of size M1 × M2. Here

i ∈ {1, 2, . . . , m} is the memory pattern index. A Hammingassociative memory returns a memory pattern Xk

M1×M2for an

input memory key YM1×M2 if

d(XkM1×M2

, YM1×M2) < d(XiM1×M2

, YM1×M2)

for all i 6= k, where d(·, ·) is defined as the city block distancebetween two arrays.

To simplify the analysis, we follow Ikeda et al. (2001) andonly consider binary memory patterns, that is, each memorypattern is a 0-1 array. In this case, the city block distanceactually reduces to the Hamming distance.

2 The derivation in Ikeda et al. (2001) actually has minor errors. It claims thatthe random variables representing the numbers of votes received by differentmemories are independent. This is wrong, as we can see by a simple examplewhere the memory set consists of only two memory patterns.

2.2. Two-level decoupled Hamming memory

A two-level decoupled Hamming memory is set up bypartitioning each of the memory patterns into a set of sub-memory patterns, and associating the corresponding sub-memory patterns of all the memory patterns with a localmemory set. The decoupled Hamming memory is operated asfollows: it firstly partitions the memory key into sub-memorykeys in the same fashion as partitioning memory patterns,calculates the city block distances locally between the sub-memory key and each sub-memory pattern in the correspondinglocal memory set to decide the closest sub-memory pattern tothe sub-memory key in each local memory set, then returns a“closest” memory pattern by employing a voting mechanismon the results obtained from all local memory sets. Thearchitecture of typical two-level Hamming memory for two-dimensional memory keys is shown in Fig. 1.

Formally, we always suppose M1 and M2 are divisibleby R1 and R2 respectively, we partition each Xi

M1×M2

(1 ≤ i ≤ m) into a set of sub-memory patterns Xi,1,1R1×R2

,

Xi,1,2R1×R2

, . . . , Xi,M1/R1,M2/R2R1×R2

, and set up a local memory set

{X1, j,tR1×R2

, X2, j,tR1×R2

, . . . , Xm, j,tR1×R2

} for each tuple ( j, t) (1 ≤ j ≤

M1/R1 and 1 ≤ j ≤ M2/R2). Then for any input memory keyY, we partition it into {Y1,1

R1×R2, Y1,2

R1×R2, . . . , YM1/R1,M2/R2

R1×R2},

and calculate the city block distances between the sub-memory key Y j,t

M1/R1×M2/R2and the sub-memory patterns

Xi, j,tM1/R1×M2/R2

, 1 ≤ i ≤ m to find the index of the closestsub-memory patterns to the sub-memory key. Finally we voteon the locally determined sub-memory pattern indexes using asimple majority scheme to return a memory pattern.

2.3. Basic assumption on the memory patterns

We follow the basic assumptions of Ikeda et al. (2001); eachmemory pattern is generated by a uniform 0.5 probability ofeach bit being 1 or 0, with the memory key being a corruptedversion of one of the memory patterns, named target memory,when subjected to certain noise.

We have many types of noise, of course. Basically, twotypes of noise have been treated in stability analysis literature(Chen & Tokuda, 2003), namely uniform random noise andconcentrated noise. As in Ikeda et al. (2001), we assume thata set of uniform random noise is capable of corrupting thetarget memory pattern into a memory key, by changing thevalue of each bit of a memory pattern by a probability of p.We further assume that concentrated noise, on the other hand,appears in blocks and changes the values of all bits polluted intothe opposite values (i.e., from 1 to 0, and 0 to 1).3 Formally,concentrated noise is here defined as a union of non-overlappedblocks of noise corrupting blocks of the two-dimensional targetmemory pattern into the memory key, such that each bit on each

3 We do not consider “double” polluting; i.e., when a bit is polluted by bothconcentrated noise and uniformly random noise, we suppose its value changesonly once.

600 L. Chen et al. / Neural Networks 20 (2007) 598–609

Fig. 1. Architecture of typical two-level Hamming memory for two-dimensional memory keys. It is set up by partitioning each of the memory patterns into a setof sub-memory patterns, and associating the corresponding sub-memory patterns of all the memory patterns with a local memory set. It is operated as follows: itfirstly partitions the memory key into sub-memory keys in the same fashion as partitioning memory patterns, calculates the city block distances locally between thesub-memory key and each sub-memory pattern in the corresponding local memory set to decide the closest sub-memory pattern to the sub-memory key in each localmemory set, then returns a “closest” memory pattern by employing a voting mechanism on the results obtained from all local memory sets.

corrupted block of the target memory pattern flips its value. Tosimplify the analysis, we assume each concentrated noise blockis able to occupy as many sub-memory patterns as possible. Wewill pay particular attention to the sub-memory patterns whichare “covered” by a noise block in its entirety, which means inthe analysis of the two-level decoupled Hamming memory allthe bits in the sub-memory patterns flip their values. We callthese sub-memory patterns fully polluted sub-memory patternsby concentrated noise. For the parts of noise blocks that arenot able to occupy an entire sub-memory pattern, we supposethat they distributed averagely over all the sub-memory patternswhich are not fully polluted by concentrated noise.

The amount of uniform random noise could be measured bypM1 M2, and the amount of concentrated noise by the total sizeS of noise blocks; each block is assumed to be of size N1 × N2.

3. Capacity analysis

Let us consider the situation when m > 2. In such a case,to determine the closest pattern to be returned with an inputmemory key, there are many voting strategies for determiningfinal output memory patterns. “Pair-wise best” and “best of all”

are the two most popular voting strategies. Using the “pair-wise best” scheme, a memory pattern is returned if and onlyif it is independently “closer” to the memory key than any othermemory pattern under pair-wise comparison. Using the “best ofall” scheme, a memory pattern is returned only if it is “closer”to the memory key when all the memory patterns are involvedin the comparison.

For single level Hamming associative memories, the “pair-wise” best (closest) is always the best (closest) of all. However,this is not always true for a two-level decoupled Hammingassociative memory. A simple example is given below. Supposewe have memory key K and three memory patterns, namely A,B and C . We partition each of the memory patterns into 7 sub-memory patterns, and associate the corresponding sub-memorypatterns of all the memory patterns with a local memory set sothat there are 7 local memory sets, namely L1, L2, . . . , L7. InL1, L2, the sub-memories of memory pattern C are “closer”to memory key K than those of A, and the sub-memories ofmemory pattern A are “closer” to the memory key K thanthose of B; in local memory sets L3 and L4, the sub-memoriesof memory pattern A are “closer” to the memory key K than


those of B, and the sub-memories of memory pattern B are“closer” to memory key K than those of C ; in the remaining3 local memory sets, the sub-memories of memory pattern Bare “closer” to memory key K than those of A, and the sub-memories of memory pattern A are “closer” to memory key Kthan those of C . In this example, the pair-wise best choosesmemory pattern A while the best of all selects memory patternB.4

Although a “pair-wise best” is not always the “best of all”,there is no reason to believe that the “best of all” is morereasonable. Actually, a pair-wise best satisfies a major criterionin voting theory (Arrow, 1963; Luce & Raiffa, 1957), namelythe independence of irrelevant alternatives criterion, whichmeans that societal preference of two alternatives should beindependent of preferences for other options, or we could saythat the winning candidate should remain to be the winner if oneor more of the losing candidates drop out. For mathematicalsimplicity, we choose the pair-wise best scheme for the caseswhen there are more than two memory patterns such that theprobability value of an associative memory with m memories isequivalent to the (m − 1)-th power of that of an associativememory with 2 memories. Because of this, to simplify theanalysis, we here only have to consider the case of m = 2.

We would like to remind the reader here that we always

use Φ(x) to denote 12π

∫ x−∞

e−y22 dy, which is the distribution

function of a standard normal distribution. In the followingdiscussion, we always assume M1 M2 is a large number.

3.1. Capacity of one-level Hamming associative memory

Theorem 1. The probability with which a one-level associativememory returns the target memory is

Prob1-level =

M1 M2∑j=S

(M1 M2 − S

j − S

)p j−S(1 − p)M1 M2− j

·Φ(

j − 0.5M1 M2√

0.25M1 M2

). (1)

For large M1 M2 − S, this can be approximated to thefollowing equation:

Prob1-level = Φ(

(0.5 − p)M1 M2 − (1 − p)S√

(M1 M2 − S)p(1 − p) + 0.25M1 M2

). (2)

Proof. We denote the city block distances from the memorykey to the target memory pattern and to the non-target patternby Dt and Dn , respectively.

4 Under the “best-of-all” strategy, B is the closest to K in 3 sub-memorysets, while C is the closest in 2 sub-memory sets, and A is the closest in theremaining 2 sub-memory sets; therefore, B is the winner — the output memorypattern, i.e., B, is closer to K than the other two. Under the “pair-wise-best”strategy, A is closer to K when it is compared to B, since A wins in 4 sub-memory sets L1, L2, L3 and L4, while B only wins in the remaining 3 sub-memory sets L5, L6 and L7; A is closer to K too when it is compared to C ,since A wins in 5 sub-memory sets L3, L4, L5, L6 and L7, while C only winsin the remaining 2 sub-memory sets L1 and L2. Therefore, the output memoryunder “pair-wise-best” strategy will be A.

For each element in the memory key, we let dt and dn denoteits distance to the corresponding elements in the target and innon-target memory patterns.

It is easy to know that, if the element is polluted byconcentrated noise, of course dt is always 1; otherwiseProb(dt = 1) = p, and Prob(dt = 0) = 1 − p. Hence,for each j ∈ {S, S + 1, . . . , M1 M2}, Prob(Dt = j) =(

M1 M2 − Sj − S

)p j−S(1 − p)M1 M2− j .

For large M1 M2 − S, the Central Limit Theorem showsthat the random variable ξt = (Dt − S − p(M1 M2 −

S))/√

p(1 − p)(M1 M2 − S) can be approximated by thestandard normal distribution N (0, 1).

On the other hand, it is easy to show that Prob(dn = 1) =

Prob(dt = 0) = 0.5 for each element in the memory key. Bythe Central Limit Theorem, the random variable ξn = (Dn −

0.5M1 M2)/√

0.25M1 M2 can be approximated by the standardnormal distribution N (0, 1), as we assume M1 M2 is a largenumber. A one-level Hamming associative memory returns thetarget memory pattern if and only if Dt < Dn , so that

Prob1-level =

M1 M2∑j=S

(M1 M2 − S

j − S

)p j−S(1 − p)M1M2− j

·Φ(

j − 0.5M1 M2√

0.25M1 M2

).

In the case that M1 M2−S is large, it is easy to show that Dt−

Dn < 0 means η < 0, where η = (√

(M1 M2 − S)p(1 − p)ξt+

S + p(M1 M2 − S)) − (√

0.25M1 M2ξn + 0.5M1 M2) =√

(M1 M2 − S)p(1 − p)ξt −√

0.25M1 M2ξn + (1 − p)S −

(0.5−p)M1 M2. As ξt and ξn are independent random variables,η can be approximated by a normal distribution5 N ((1− p)S −

(0.5−p)M1 M2, p(1−p)(M1 M2−S)+0.25M1 M2). Eq. (2) cannow be obtained from the definition of the distribution functionfor a normal distribution. �

3.2. Capacity of two-level decoupled Hamming associativememory

Theorem 2. The probability with which a two-level decoupledHamming associative memory returns the target memory is

Prob2-level =

j+i≤ M1 M2R1 R2

−n p,k≤n p∑k+ j<i≤ M1 M2

R1 R2−n p

(n pk

)Pk

s (1 − Ps)n p−k

×

M1 M2

R1 R2− n p

i

P it

·

M1 M2

R1 R2− n p − i

j

P jn P

M1 M2R1 R2

−n p−i− jo , (3)

5 This could be proved by noticing that the characteristic function of a

normal distribution N (a, δ2) is eiat− 12 δ2t2

, while the characteristic function ofan random variable η = aξ + b is eibt f (at) where f (t) is the characteristicfunction of variable ξ .


where

Ps = 1 − 0.5R1 R2 , (4)

Pt =

∑dn f e≤i< j≤R1 R2

(R1 R2 − dn f e

i − dn f e

)

× pi−dn f e(1 − p)R1 R2−i(

R1 R2j

)(12

)R1 R2

, (5)

Pn =

∑0≤ j<i,dn f e≤i≤R1 R2

(R1 R2 − dn f e

i − dn f e

)

× pi−dn f e(1 − p)R1 R2−ii−1∑j=0

(R1 R2

j

)(12

)R1 R2

, (6)

Po = 1 − Pn − Pt , (7)

n f =

SN1 N2

× (N1 N2 − bN1R1

cbN2R2

cR1 R2)

M1 M2R1 R2

−S

N1 N2b

N1R1

cbN1R1

c, (8)

n p =S

N1 N2·

⌊N1

R1

⌋·

⌊N2

R2

⌋. (9)

For large R1 R2, Eqs. (5) and (6) can be approximated by

Pt =

R1 R2−1∑i=dn f e

(R1 R2 − dn f e

i − dn f e

)pi−dn f e(1 − p)R1 R2−i

×

(1 − Φ

((i + 1) − 0.5R1 R2

√0.25R1 R2

)), (10)

Pn =

R1 R2∑i=n f

(R1 R2 − n f

i − n f

)pi−n f (1 − p)R1 R2−iΦ

×

((i − 1) − 0.5R1 R2

√0.25R1 R2

). (11)

When R1 R2 − n f is large, Eqs. (5) and (6) can further besimplified to

Pt = Φ

((0.5 − p)R1 R2 − (1 − p)n f√

(R1 R2 − n f )p(1 − p) + 0.25R1 R2

)(12)

Pn = 1 − Φ

((0.5 − p)R1 R2 − (1 − p)n f√

(R1 R2 − n f )p(1 − p) + 0.25R1 R2

). (13)

Eq. (3) can be simplified as follows.When n p is large, and Ps is not too close to 0:

Prob2-level =

j<i∑j+i≤ M1 M2

R1 R2−n p

M1 M2

R1 R2− n p

i

× P it

M1 M2

R1 R2− n p − i

j

P jn P

M1 M2R1 R2

−n p−i− jo

×Φ

(i − j − 1 − n p Ps√

n p Ps(1 − Ps)

). (14)

When M1 M2R1 R2

−n p is large, and |Pn − Pt | is not too close to 1:

Prob2-level =

n p∑k=0

(n pk

)Pk

s (1 − Ps)n p−kΦ

×

−k − 1 −

(M1 M2R1 R2

− n p

)(Pn − Pt )√(

M1 M2R1 R2

− n p

)(Pn + Pt − (Pn − Pt )2)

.

(15)

When both n p and M1 M2R1 R2

− n p are large, and neither Ps nor|Pn − Pt | is too close to 1:

Prob2-level = Φ

−

(M1 M2R1 R2

− n p

)(Pn − Pt ) − n p Ps√

2Pn Pt

(M1 M2R1 R2

− n p

)+ n p Ps(1 − Ps)

.

(16)

Proof. We let Vt and Vn denote the numbers of “votes” thatthe target memory pattern and the non-target memory patternreceive from the two-level Hamming associative memory.Obviously, the two-level Hamming associative memory returnsthe target memory if and only if Vt > Vn .

We further let V (1)t and V (1)

n denote the total numbersof “votes” that the target memory pattern and the non-targetmemory pattern receive in the local memory sets of whichthe corresponding sub-memory keys are fully polluted byconcentrated noise; let V (2)

t and V (2)n denote the total numbers

of “votes” that the target memory pattern and the non-targetmemory pattern receive from the local memory sets of whichthe corresponding sub-memory keys are not fully polluted byconcentrated noise. Clearly, Vt = V (1)

t + V (2)t , Vn = V (1)

n +

V (2)n .

Based on the assumption in the earlier section, the numberof the local memory sets of which the sub-memory keys arefully polluted by concentrated noise is n p = S/(N1 N2) ·

bN1/R1cbN2/R2c. In each of these local memory sets, the cityblock distance from the sub-memory of the target memory tothe sub-memory key is always R1 R2, which cannot be smallerthan that from the sub-memory of non-target memory to thesub-memory key. Consequently, we have V (1)

t = 0, V (1)n ≤ n p.

In each of such memory sets, the chance that the sub-memory key is closer to non-target memory than to targetmemory is

Ps = 1 − 0.5R1 R2 .

Hence, the probability that V (1)n = k is

Prob(V (1)n = k) = Prob(V (1)

n − V (1)t = k)

=

(n pk

)Pk

s (1 − Ps)n p−k . (17)

When n p is large and Ps is not close to 1, V (1)n could be

approximated by a normal distribution N (n p Ps, n p Ps(1−Ps)).


Thus,

Prob(V (1)n ≤ k) = Prob(V (1)

n − V (1)t ≤ k)

= Φ

(k − n p Ps√

n p Ps(1 − Ps)

). (18)

We now consider all the remaining M1 M2R1 R2

−n p local memorysets of which the corresponding sub-memory keys are not fullypolluted by concentrated noise. For each of such local memorysets, let dlt and dln denote the distance from the sub-memorypattern of the target memory and the distance from the sub-memory pattern of a non-target memory to the correspondingsub-memory key, respectively. Notice that we assume that the“cut-off” portions of concentrated noise blocks are distributedaveragely in the regions. Hence, each non-concentratednoise polluted sub-memory key includes n f = (S −

S/(N1 N2) × (N1 N2 − bN1/R1cbN2/R2c))/(M1 M2/(R1 R2) −

S/(N1 N2)bN1/R1cbN1/R1c) elements that were convertedfrom the target memory pattern because of concentrated noise.Consequently, we have dlt ≥ n f . The probabilities that alocal Hamming associative memory returns the target memory,or it returns the non-target memory, or is tied up6 are Pt =

Prob(dt < dn), Pn = Prob(dt > dn) and Po = Prob(dt = dn),respectively. We should have no difficulty in expanding theseexpressions into Eqs. (5)–(7).

For large R1 R2, Eqs. (5) and (6) can be simplified by Eqs.(10) and (11), since the random variable dln can be approxi-mated by a normal distribution N (0.5R1 R2, 0.25R1 R2).

When R1 R2−n f is large, dlt −n f can also be approximatedby a normal distribution: N (p(R1 R2 − n f ), p(1 − p)(R1 R2 −

n f )). Hence, dlt − dnt should be approximated by a normaldistribution N (((1− p)n) f −(0.5− p)R1 R2, p(1− p)(R1 R2 −

n f ) + 0.25R1 R2) (see footnote 5). According to the definitionof distribution function, we can then obtain Eqs. (12) and (13).

We can use exactly the same technique in proving Theorem 1

to approximate Pt and Pn by Pt =1

2π

∫+∞

x2e

y22 dy, Pn =

12π

∫ x3−∞

ey22 dy.

We can see that in all the sub-memory sets of which thecorresponding sub-memory keys are not fully polluted byconcentrated noise, the probability that V (2)

t = i and V (2)n = j

(i + j ≤M1 M2R1 R2

− n p) is

Prob(V (2)t = i, Vn(2) = j) =

M1 M2

R1 R2− n p

i

P it

×

M1 M2

R1 R2− n p − i

j

P jn P

M1 M2R1 R2

−n p−i− jo . (19)

Note that the contributions of elements with dlt < dln ,dlt > dln and dlt = dln to the variable V (2)

n − V (2)t are −1,

+1 and 0 respectively; it is easy to calculate the expected value

6 Although a tied-up case seldom happens, we list here for completeness.

and variance of the contribution of each element as follows:

µ = Pn − Pt ,

δ2= Pt (−1 − (pn − Pt ))

2+ Pn(1 − (pn − Pt ))

2

+ Po(0 − (pn − Pt ))2

= Pn + Pt − (Pn − Pt )2.

Thus, in the case that M1 M2R1 R2

− n p is large, and |Pn − Pt |

is not too close to 1, V (2)n − V (2)

t follows a normal distributionN ((

M1 M2R1 R2

−n p)(Pn − Pt ), (M1 M2R1 R2

−n p)(Pn + Pt −(Pn − Pt )2)).

Hence, we have

Prob(V (2)n − V (2)

t ≤ x)

= Φ

x −

(M1 M2R1 R2

− n p

)(Pn − Pt )√(

M1 M2R1 R2

− n p

)(Pn + Pt − (Pn − Pt )2)

. (20)

We can show that, if V (1)n − V (1)

t = k, if the two-leveldecoupled Hamming associative memory is to return the targetmemory, we must have V (2)

t − V (2)t ≥ k + 1. Consequently,

we can obtain Eq. (3), representing the probability that V (2)t −

V (2)t > 0, by using Eqs. (17) and (19).

We now consider the special cases when either n p or M1 M2R1 R2

−

n p is large, or both of them are larger.When n p is large and Ps is not too close to 0, we notice that,

as long as V (2)t − V (2)

n > 0 and V 1n ≤ V (2)

t − V (2)n − 1, the

two-level Hamming associative memory can always return thetarget memory pattern. We can then obtain Eq. (14) by makinguse of Eqs. (18) and (19).

When M1 M2R1 R2

− n p is large, and |Pn − Pt | is not too close to

1, we notice that, as long as V (2)n − V (2)

n < −(V (1)n − V (1)

t ), thetwo-level Hamming associative memory can always return thetarget memory pattern. We can then obtain Eq. (15) by usingEqs. (17) and (20).

When both n p and M1 M2R1 R2

− n p are large, and neither Ps

nor |Pn − Pt | is too close to 1, because both V (1)n − V (1)

t andV (2)

n − V (2)t follow a normal distribution, Vn − Vt = (V (1)

n −

V (1)t ) + (V (2)

n − V (2)t ) must also follow a normal distribution

N (µ, δ2), where

µ =

(M1 M2

R1 R2− n p

)(Pn − Pt ) + n p Ps,

δ2=

(M1 M2

R1 R2− n p

)(Pn + Pt − (Pn − Pt )

2)

+ n p Ps(1 − Ps).

As Vn − Vt < 0 means that the two-level Hamming associativereturns the target memory pattern, Eq. (16) can now beobtained. �

3.3. Conclusion

Figs. 2–4 illustrate respectively the capacities of the twodifferent Hamming associative memories with different amountof concentrated noise polluted cells and different region sizes


Fig. 2. Capacities of Hamming associative memories with different amount of concentrated noise polluted cells and different region sizes when uniform noise isabsent. The capacity of an associative memory is defined as the maximum number of memory patterns to be stored reliably at a fixed noise level (Kawamura &Hirai, 1997). It can be estimated by computing the probability as shown here with which the associative memory returns the target memory for a fixed number ofmemory patterns at a fixed amount of noise. In this picture, p represents the probability that the value of each bit of a memory pattern gets changed due to uniformrandom noise. p = 0 means that the uniform noise is absent. M1 × M2 represents the size of each memory pattern. N1 × N2 represents the size of noise blocks, theunion of which defines concentrated noise.

Fig. 3. Capacities of Hamming associative memories with different amount of concentrated noise polluted cells and different region sizes, when a certain amountof uniform noise is present. The capacity of an associative memory is defined as the maximum number of memory patterns to be stored reliably at a fixed noise level(Kawamura & Hirai, 1997). It can be estimated by computing the probability as shown here with which the associative memory returns the target memory for afixed number of memory patterns at a fixed amount of noise. In this picture, p represents the probability that the value of each bit of a memory pattern gets changeddue to uniform random noise. p = 0 means that the uniform noise is absent. M1 × M2 represents the size of each memory pattern. N1 × N2 represents the size ofnoise blocks, the union of which defines concentrated noise.


Fig. 4. Capacities of Hamming associative memories with different amount of concentrated noise polluted cells and different region sizes, when a large amount ofuniform noise is presented. The capacity of an associative memory is defined as the maximum number of memory patterns to be stored reliably at a fixed noise level(Kawamura & Hirai, 1997). It can be estimated by computing the probability as shown here with which the associative memory returns the target memory for afixed number of memory patterns at a fixed amount of noise. In this picture, p represents the probability that the value of each bit of a memory pattern gets changeddue to uniform random noise. p = 0 means that the uniform noise is absent. M1 × M2 represents the size of each memory pattern. N1 × N2 represents the size ofnoise blocks, the union of which defines concentrated noise.

7 The training and test sets in the experiments in Ikeda et al. (2001) are tooartificially created according to the assumption of the model; they should notbe used to verify the correctness of the model itself.

when uniform noise is absent, when a certain amount ofuniform noise is present and when a large amount of uniformnoise is present. Fig. 5 shows the capacities with differentamount of uniform noise and different region sizes, when someconcentrated noise is present. All these figures show clearly thatthe two-level Hamming associative memory always has largercapacity than one-level Hamming associative memory does.

Notice that, in Fig. 5, the amount of concentrated noise S ismuch smaller than that of uniform noise pM1 M2.

The improved capacity of two-level Hamming associativememory can also be seen in Fig. 6, which illustrates thecapacities of one-level and two-level Hamming associativememories with different uniform noise level p and differentamount of concentrated noise S.

We should mention that it is hard to see any differencebetween one-level and two level Hamming associativememories in these figures, for relatively small amount ofconcentrated noise S. However, there is some difference there,although it is extremely small when the memory pattern sizeM1 M2 is large.

When the memory pattern size M1 M2 is not too large, Sis extremely small and p is large, we could notice by Fig. 7,that the capacity of two-level Hamming associative memoryis a little smaller than that of one-level Hamming associativememory. We must point out that, even if this deteriorated

capacity against uniform noise is very small, it is important ifthere are a lot of memory patterns in the memory sets.

This is actually why the experimental analysis in Ikedaet al. (2001) shows a substantially lower capacity of two-levelHamming associative memory in comparison with one-levelHamming associative memory in the case when only uniformrandom noise is present.7 However, as it can be seen in Figs. 3–5, if the amount of concentrated noise is not extremely smalllike in Fig. 7, even if it is still far smaller than the amountof uniform noise pM1 M2, the deteriorated capacity of two-level Hamming associative memory against uniform noise ismore than compensated by its substantially increased capacityagainst concentrated noise.

4. Experiment

An experiment has been set up using 2 sets of frontal-view facial pictures of 100 persons; each set contains 100 pic-tures of different persons. These two sets of frontal imagesare chosen from image sets fa and fb of the FERET dataset


Fig. 5. Capacities of Hamming associative memories with different amount of uniform noise and different region sizes, when some concentrated noise is presented.The capacity of an associative memory is defined as the maximum number of memory patterns to be stored reliably at a fixed noise level (Kawamura & Hirai, 1997).It can be estimated by computing the probability as shown here with which the associative memory returns the target memory for a fixed number of memory patternsat a fixed amount of noise. In this picture, p represents the probability that the value of each bit of a memory pattern gets changed due to uniform random noise.M1 × M2 represents the size of each memory pattern. N1 × N2 represents the size of noise blocks, the union of which defines concentrated noise. S represents thetotal size of noise blocks.

(Phillips, Wechsler, Huang, & Rauss, 1998), which is a bench-mark database for facial recognition. We crop and scale eachface into a standard size of 72 × 72 pixels, using the position ofthe eyes, such that the line between two eyes is parallel to thehorizontal axis; the inter-ocular distance is set to be 38 pixels.

The purpose of this experiment is not to show theadvantage of associative memory over other methods for facialrecognition, but to investigate the capacities of one-level andtwo-level associative memories in real applications.

We use one set as the memory set, and the other set as thetest set.

In order to make the associative memories robust to smallshifts, we use a shifting approach. When computing the localdistance between a sub-memory of the input pattern and a sub-memory of a memory pattern for two-level model, we perturbthe position of the window a few pixels in each direction,compute the distance in all nearby positions, and then select thesmallest among these candidate distances as the local distance.Consequently, when computing the distance between the inputpattern and a memory pattern for the one-level model, welist all the windows of size 66 × 66 in the input and in thememory image and compute the distances between windowsin the input and windows in the memory image. Among these

many different distances, we select the smallest as the distancebetween the input and the memory image.

We compare the recognition performance of one-level andtwo-level Hamming associative memory. Table 1 shows therecognition rate at different sizes of windows. This shows thatthe two-level Hamming associative memory works much betterthan the one-level Hamming associative memory.

5. Concluding remarks

We have shown in this paper that the two-level Hammingassociative memory has higher capacity than one-levelHamming associative memory against concentrated noiseat any intensity. We also showed that, although two-levelHamming associative memory has slightly lower capacityagainst uniform random noise, its deteriorated capacity ismore than compensated by its substantially increased capacityagainst concentrated noise, even if the amount of concentratednoise is not high. The presence of concentrated noise is verycommon in practice. We believe that the presence of purelyuniform noise is very rare in practice: these “pure” situationsare unlikely to occur, because such a situation actually meansthat the noise is dispersed uniformly over the whole patternwithout any exception, which makes the noise sound too regular


Fig. 6. Capacities of Hamming associative memories with different total size of noise blocks and different amount of uniform noise. The capacity of an associativememory is defined as the maximum number of memory patterns to be stored reliably at a fixed noise level (Kawamura & Hirai, 1997). It can be estimated bycomputing the probability as shown here with which the associative memory returns the target memory for a fixed number of memory patterns at a fixed amount ofnoise. In this picture, p represents the probability that the value of each bit of a memory pattern gets changed due to uniform random noise. p = 0 means that theuniform noise is absent. M1 × M2 represents the size of each memory pattern. M1 × M2 represents the size of each sub-memory pattern. N1 × N2 represents thesize of noise blocks, the union of which defines concentrated noise. S represents the total size of noise blocks.

Table 1Performances of one-level and two-level Hamming associative memory for facial recognition

Window size for two-level Hamming associative memory One-level Hamming3 × 3 6 × 6 9 × 9 12 × 12 18 × 18 associative memory

Recognition rate 84% 89% 92% 97% 87% 86%

In order to make the associative memories robust to small shifts, a shifting approach is used here. When computing the local distance between a sub-memory of theinput pattern and a sub-memory of a memory pattern for two-level model, we perturb the position of the window a few pixels in each direction, compute the distancein all nearby positions, and then select the smallest among these candidate distances as the local distance. Consequently, when computing the distance between theinput pattern and a memory pattern for the one level model, we list all the windows of size 66 × 66 in the input and in the memory image and compute the distancesbetween windows in the input and windows in the memory image. Among these many different distances, we select the smallest as the distance between the inputand the memory image.

and too predictable to be “random” noise. The superiority of thetwo-level Hamming associative level is evident.

The experiment on a real dataset has clearly verified ourtheoretical conclusion.

While two-level Hamming associative memory can be takenas a divide-and-conquer approach, it is radically different formthe M3 neural network proposed by Lu and Ito (1999), wherea multi-label classification problem is partitioned into a seriesof bi-classification problems, each of which is solved by aneural network. We divide the input memory keys while Luand Ito (1999) partitions the output label sets. But it shouldbe understandable that these two approaches could be mixedtogether by firstly partitioning a class label set with Lu and

Ito’s approach to generate many bi-classification problems, thensolving each of the bi-classification problems by the two-levelHamming associative memory approach.

Recently, Cao, Murata, Amari, Cichocki, and Takeda(2003), have successfully used a PCA-based pre-whiteningtechnique to reduce noise before further separation of differentsignal components for magnetoencephalography (MEG) dataanalysis. It may be very useful, and can substantially increasethe accuracy, to use a similar technique to reduce the noise of amemory key before employing a two-level associative memoryto determine its label. As the advantage of the two-levelassociative memory comes from the fact that it can localizethe effects of concentrated noise into a restricted number of


Fig. 7. Capacities of Hamming associative memories with different total size of noise blocks and different region sizes, when concentrated noise is absent and theentire memory pattern size M1 M2 is small. The capacity of an associative memory is defined as the maximum number of memory patterns to be stored reliably at afixed noise level (Kawamura & Hirai, 1997). It can be estimated by computing the probability as shown here with which the associative memory returns the targetmemory for a fixed number of memory patterns at a fixed amount of noise. In this picture, p represents the probability that the value of each bit of a memory patterngets changed due to uniform random noise. M1 × M2 represents the size of each memory pattern. N1 × N2 represents the size of noise blocks, the union of whichdefines concentrated noise. S represents the total size of noise blocks.

windows, and it does not show any advantage if only uniformrandom noise is presented; a careful analysis of the features ofthe remaining component should be interesting and important ifwe want to use noise reduction as the first stage of recognition.

Acknowledgements

The authors are most grateful to the reviewers for manyhelpful comments on the original version of this paper.

The authors appreciate the help of Mr. R. Chen at JilinUniversity in implementing the experiments.

This research of the first author is supported by a DiscoveryGrant of NSERC, Canada. Portions of the research in this paperuse the FERET database of facial images collected under theFERET program.

A preliminary version of the paper was presented at 2005IEEE International Symposium on Circuits and Systems, Kobe,Japan, in May 2005.

References

Arrow, K. J. (1963). Social choice and individual values (2nd ed.). YaleUniversity Press.

Cao, J., Murata, N., Amari, S., Cichocki, A., & Takeda, T. (2003). Arobust approach to independent component analysis of signals with high-level noise measurements. IEEE Transactions on Neural Networks, 14(3),631–645.

Chen, L., & Tokuda, N. (2003). Stability analysis of regional and nationalvoting schemes by a continuous model. IEEE Transactions on Knowledgeand Data Engineering, 15(4), 1037–1042.

Chiueh, T., & Goodman, R. (1991). Recurrent correlation associativememories. IEEE Transactions on Neural Networks, 2(2), 275–284.

Chou, P. A. (1989). The capacity of the Kanerva associative memory. IEEETransactions on Information Theory, 35(2), 281–298.

Hassoun, M. H. (Ed.), (1993). Associative neural memories: Theory andimplementation. New York: Oxford University Press.

Hopfield, J. J. (1982). Neural networks and physical systems with emergentcollective computational abilities. Proceedings of the National Academy ofSciences, 79, 2554–2558.

Ikeda, N., Watta, P., Artiklar, M., & Hassoun, M. (2001). A two-level Hammingnetwork for high performance associative memory. Neural Networks, 14(9),1189–1200.

Ikeda, N., Watta, P., & Hassoun, M. (1998). Capacity analysis of the two-leveldecoupled hamming associative memory. In Proceedings of internal jointconference on neural networks (pp. 486–491).

Kawamura, M., & Hirai, Y. (1997). Storage capacity analysis on a model ofhuman associative processing, HASP. Systems and Computers in Japan,23(1), 24–33.

Lu, B. -L., & Ito, M. (1999). Task decomposition and module combinationbased on class relations: A modular neural network for patternclassification. IEEE Transactions on Neural Networks, 10(5), 1244–1256.


Luce, R. D., & Raiffa, H. (1957). Games and decisions — Introduction andcritical survey. John Wiley and Sons.

Phillips, P., Wechsler, H., Huang, J., & Rauss, P. (1998). The FERET databaseand evaluation procedure for face recognition algorithms. Image and Vision

Computing, 16(5), 295–306.Watta, P., Wang, K., & Hassoun, M. (1997). Recurrent neural nets as

dynamical Boolean systems with application to associative memory. IEEETransactions on Neural Networks, 8(6), 1268–1280.

Documents

Capacity analysis for a two-level decoupled Hamming network for associative memory under a noisy environment