Complexity preserving increase of the capacity of bidirectional associative memories by dilation and translation

Contributed article

Complexity preserving increase of the capacity of bidirectional associativememories by dilation and translation

Burkhard Lenze*Fachbereich Informatik, Fachhochschule Dortmund, Postfach 105018, D-44047 Dortmund, Germany

Received 12 September 1997; accepted 17 April 1998

Abstract

In this paper, we show how to increase the capacity of Kosko-type bidirectional associative memories by introducing dilation andtranslation parameters in the pattern space. The essential point of this approach is that the increase of the capacity of the networks almostdoesn’t affect their complexity. The general idea of the proposed strategy is to tune the networks in such a way that they are better suited tostore patterns which are highly correlated without touching their ability to recover easy-to-store information. A detailed example at the end ofthe paper shows how our approach works in practice.q 1998 Elsevier Science Ltd. All rights reserved.

Keywords:Dilation; Translation; Bidirectional associative memories; Kosko-type networks

1. Introduction

As it is well-known, the standard problem in associativememory design is to store a set of, let’s say,t bipolar codedpatterns

(x(s),y(s)) [ { ¹ 1,1}n 3 { ¹ 1, 1}m, 1 # s# t (1)

More precisely, the device should be constructed in such away that entering somex (s) (or a slightly disturbed versionof x (s)) results in the propery (s) as output (cf. Kohonen,1984; Kamp and Hasler, 1990; or Hecht-Nielsen, 1990,Chapter 4, for general information about associative mem-ories resp. networks). In case of linear or even nonlinearunidirectional associative memories or Kosko-type bidirec-tional associative memories (cf. Grossberg, 1968; Ander-son, 1972; Kohonen, 1972; Cohen and Grossberg, 1983;Kosko, 1987; Kosko, 1988; Hassoun, 1989; Wang et al.,1990; Wang et al., 1991; Srinivasan and Chia, 1991;Shanmukh and Venkatesh, 1993; Zhang et al., 1993;Leung, 1993; Leung, 1994 and, finally, Wang et al., 1994)the classical Hebb learning rule

wij : ¼∑t

s¼ 1x(s)

i y(s)j , 1 # i # n, 1 # j # m (2)

usually guarantees perfect recall in case that the input

patterns are pairwise orthogonal,∑n

i ¼ 1x(s)

i x(r)i ¼ 0, r Þ s, 1 # r, s# t (3)

Unfortunately, in general this is not the case in practice.Therefore, our idea is to reorthogonalize especially highlycorrelated information by appropriate dilation and transla-tion of the patterns. This idea originally has been introducedin Lenze (1996, 1998) for the Hopfield-type setting (see alsoHopfield, 1982; McEliece et al., 1987; and Kamp andHasler, 1990 for details about Hopfield-type neural net-works). In this context, our precise problem is as follows:find dilation parametersa i [ R, 1 # i # n, and translationparametersdi [ R, 1 # i # n, for the input pattern space(and — for symmetry — later on also for the output patternspace) such that the ratio

∑1#r,s#t

�� ∑n

i ¼ 1(aix

(s)i ¹ di)(aix

(r)i ¹ di)

��∑t

s¼ 1

∑n

i ¼ 1(aix

(s)i ¹ di)2

(4)

becomes minimal. In case that the initial training set satis-fies Eq. (3) the optimal choice would bea i ¼ 1, di ¼ 0,1 # i # n, which would result in 0 for the ratio Eq. (4). Ingeneral, however, Eq. (3) is not fulfilled and, therefore, thestated problem may be interpreted as an attempt to make theinitial training set ‘more orthogonal’ by appropriate dilation

* Corresponding author. Tel.: 00 49 231 9112 362; Fax: 00 49 231 9112313; E-mail: [email protected]

0893-6080/98/$19.00q 1998 Elsevier Science Ltd. All rights reserved.PII: S0893-6080(98)00073-2

Neural Networks 11 (1998) 1041–1048

NN 1224

PERGAMON

NeuralNetworks

and translation. Obviously, for given input patternsxðsÞ[{ ¹ 1;1} n, 1 # s # t, the above minimization problemis highly nonlinear and no easy algorithm is known to fix thebesta i,di [ R, 1 # i # n, at once. Moreover, we have totake care that scaling the pattern space can be properlyimplemented in the standard associative memory contextwithout — for example — destroying such basic featuresas the stable state property in the recursive Kosko-type set-ting. Therefore, we will proceed as follows. In the nextsection, we will show that under a positivity assumptionon the dilation parameters a generalized Kosko-type net-work can be established including the stable state property.In this stage of the discussion, the dilation and translationparametersa i, di [ R, 1 # i # n, are still quite general andnot designed in order to approximately minimize Eq. (4). Asalready noted, the explicit solution of the minimizationproblem is quite difficult and time-consuming. Therefore,we will only develop some easy heuristic strategy to fix thedilation and translation parameters in such a way that Eq. (4)at least becomes smaller compared to the classical choiceai ¼ 1 anddi ¼ 0, 1 # i # n. During this consideration wewill see that we will end up with pairs of parameters whichalready proved to be well-suited in connection with incom-plete sigma-pi neural networks of Hopfield-type (cf. Lenze,1998). At the end of the paper, we will apply our dilationand translation approach to some special examples and willsummarize our basic ideas in a compact form.

2. Bidirectional associative memories with dilation andtranslation

A standard bidirectional associative memory in the senseof Kosko (1987, 1988) (in the following shortly calledBAM) consists of a finite number of input units (input neu-rons) which have bidirectional connections to a finitenumber of output units (output neurons). No other neu-ron–neuron connections exist. The dynamic states of sucha device are essentially ruled by the following signum-typetransfer functionTS:R → { ¹ 1,1,n.d.},

TS(y) : ¼

¹ 1, y , 0,

n:d:, y ¼ 0,

1, y . 0:

8>><>>: (5)

Here, the abbreviation n.d. means that fory ¼ 0 the functionTS is not defined. In case that the BAM has precisely n inputunits and m output units the network should be able toassociate a finite number of, let’s say,t bipolar coded vec-torsx (s) [ { ¹ 1,1}n with their respective counterpartsy (s) [{ ¹ 1,1}m, 1 # s# t, even in case of slightly disturbed inputinformation (cf. Kamp and Hasler, 1990 or Hecht-Nielsen,1990, p. 103, for a more gentle introduction to BAMS). Inthe following, we will introduce the recall and learningmode for a generalized version of BAMs including dilationand translation of the input and output pattern space.

Definition 2.1. (recall mode). Letn, m [ N be given arbi-trarily. The generalized BAM may have n input neurons,moutput neurons, andmn connection weightswij [ R,1 # i#n, 1 # j # m. Moreover, it may have n input dilationand translation weightsa i, di [ R, a i . 0, 1# i # n, andmoutput dilation and translation weightsb j, f j [ R, b j . 0,1 # j#m. We assume that all weights (connection, dilation,translation) are generated by means of any reasonable learn-ing algorithm (for example, the one given in Definition 2.3)and the only assumption on the weights we make is thepositivity of the dilation weights. Now, in case that aninput vectorx [0] is entered,

x[0] ¼ (x[0]1 ,x[0]

2 , …,x[0]n ) [ { ¹ 1,1}n (6)

andy [¹1] is defined as

y[ ¹ 1] : ¼ ( ¹ 1, ¹ 1, …, ¹ 1) [ { ¹ 1,1}m (7)

the generalized BAM generates a sequence of vectors(x[u] ,y[u] )u[N0

via

y[u]j : ¼

TS

∑n

i ¼ 1wij (aix

[u]i ¹ di

!

for∑n

i ¼ 1wij (aix

[u]i ¹ di) Þ 0,

y[u¹ 1]j for

∑n

i ¼ 1wij (aix

[u]i ¹ di) ¼ 0,

8>>>>>>>>><>>>>>>>>>:(8)

for 1 # j # m and

x[uþ 1]i : ¼

TS

∑mj ¼ 1

wij (bjy[u]j ¹ fj)

!

for∑mj ¼ 1

wij (bjy[u]j ¹ fj) Þ 0,

x[u]i for

∑mj ¼ 1

wij (bjy[u]j ¹ fj) ¼ 0,

8>>>>>>>>><>>>>>>>>>:(9)

for 1 # i # n and u [ N0. As the output vector the so-defined recall mode yields the vectory [v] [ { ¹ 1,1}m,which for the first time satisfies

y[v] ¼ y[vþ 1] (10)

for somev [ N0.

Remarks. (1) Note that fora i ¼ 1, di ¼ 0, 1 # i # n, andbj ¼ 1, f j ¼ 0, 1# j # m, the above definition reduces to thestandard BAM recall mode. (2) The above definition impli-citly states that the recall mode terminates at a stable state.We will prove this in the following theorem.

Theorem 2.2.(stable state theorem). Letn, m[ N be givenarbitrarily and let the generalized BAM be initializedexactly according to Definition 2.1. Now, in case that an

1042 B. Lenze / Neural Networks 11 (1998) 1041–1048

input vectorx [0] is entered,

x[0] ¼ (x[0]1 ,x[0]

2 , …,x[0]n ) [ { ¹ 1,1}n (11)

andy [¹1] is defined as

y[ ¹ 1] : ¼ ( ¹ 1, ¹ 1, …, ¹ 1) [ { ¹ 1, 1}m (12)

there exists a natural numberv [ N0, such that for thesequence of vectorsy [u] [ { ¹ 1,1}m, u [ N0, generatedby the generalized BAM recall mode we have

y[v] ¼ y[vþ u] , u [ N0 (13)

Proof. Let the sequence of vectors

(x[u] , y[u] ) [ { ¹ 1, 1}n 3 { ¹ 1,1}m, u [ N0 (14)

be generated according to Definition 2.1. For two arbitrarybipolar coded vectors (x, y) [ { ¹ 1,1}n 3 { ¹ 1,1}m, weintroduce the so-called energy functionalE,

E(x, y) : ¼ ¹∑n

i ¼ 1

∑mj ¼ 1

wij (aixi ¹ di)(bjyj ¹ fj) (15)

In a first step, we compare the energiesE(x [u], y [u]) andE(x½uþ 1ÿ, y [u]), whereu [ N0 may be given arbitrarily:

E(x[uþ 1] ,y[u] ) ¹ E(x[u] , y[u] )

¼ ¹∑n

i ¼ 1

∑mj ¼ 1

wij (bjy[u]j ¹ fj)(aix

[uþ 1]i ¹aix

[u]i )

¼ ¹X

1#i#nx[uþ 1]

i Þx[u]i

ai

3∑mj ¼ 1

wij (bjy[u]j ¹ fj)

!x[uþ 1]

i ¹ x[u]i

ÿ � !¼ ¹

X1#i#n

x[uþ 1]i Þx[u]

i

2ai

3

��∑mj ¼ 1

wij (bjy[u]j ¹ fj)

��# 0 ð16Þ

The two final steps hold becausea i . 0 by presupposition,x[uþ 1]

i ¼ ¹ x[u]i since we only considered those neurons in

the sum which changed their states, and, finally,

x[uþ 1]i ¼ TS

∑mj ¼ 1

wij (bjy[u]j ¹ fj)

!(17)

Therefore, we obtain

E(x[uþ 1] ,y[u] ) # E(x[u] ,y[u] ) (18)

and, more precisely,

E(x[uþ 1] ,y[u] ) , E(x[u] , y[u] ) (19)

in case that there is at least one indexi [ {1,2,…,n} with

x[uþ 1]i Þ x[u]

i (20)

Going on, we now compare the energiesE(x [uþ1], y [u]) andE(x [uþ1], y [uþ1]):

E(x[uþ 1] ,y[uþ 1] ) ¹ E(x[uþ 1] ,y[u] )

¼ ¹∑n

i ¼ 1

∑mj ¼ 1

wij (aix[uþ 1]i ¹ di)(bjy

[uþ 1]j ¹ bjy

[u]j )

¼ ¹X

1#j#my[uþ 1]

j Þy[u]j

bj

3∑n

i ¼ 1wij (aix

[uþ 1]i ¹ di)

!y[uþ 1]

j ¹ y[u]j

� � !

¼ ¹X

1#j#my[uþ 1]

j Þy[u]j

2bj

��∑n

i ¼ 1wij (aix

[uþ 1]i ¹ di)

��#0 ð21Þ

Here, the two final steps hold becauseb j . 0 by presupposi-tion, y[uþ 1]

j ¼ ¹ y[u]j since we only considered those neurons

in the sum which changed their states, and, finally,

y[uþ 1]j ¼ TS

∑n

i ¼ 1wij (aix

[uþ 1]i ¹ di)

!(22)

Again, we obtain

E(x[uþ 1] ,y[uþ 1] ) # E(x[uþ 1] , y[u] ) (23)

and, more precisely,

E(x[uþ 1] ,y[uþ 1] ) , E(x[uþ 1] , y[u] ) (24)

if there exists at least one indexj [ {1,2,…,m} with

y[uþ 1]j Þ y[u]

j (25)

Summing up, we can conclude that foru [ N0 we alwayshave

E(x[uþ 1] ,y[uþ 1] ) , E(x[u] , y[u] ) (26)

if there exists at least one indexi [ {1,2,…,n} with

x[uþ 1]i Þ x[u]

i (27)

or an index j[ {1,2,…,m} with

y[uþ 1]j Þ y[u]

j (28)

Since the set {¹ 1,1}n 3 { ¹ 1,1}m is finite, theenergy functional E can properly decrease on (x [u], y [u]) [{ ¹ 1,1}n 3 { ¹ 1,1}m only a finite number of times.

1043B. Lenze / Neural Networks 11 (1998) 1041–1048

Therefore, there must exist a natural numberv [ N0 satis-fying

E(x[vþ 1] , y[vþ 1] ) ¼ E(x[v] ,y[v] ) (29)

Using the above arguments this immediately implies

y[v] ¼ y[vþ 1] (30)

resp.

y[v] ¼ y[vþ u] , u [ N0 (31)

B

At the moment, the connection, dilation and translationweights are not yet tuned in order to store any bipolar asso-ciations as given by Eq. (1). This will be done in the follow-ing definition.

Definition 2.3. (learning mode). Letn, m [ N be givenarbitrarily. The generalized BAM may haven input neurons,m output neurons, andmn connection weightswij [ R,1 # i#n, 1 # j # m. Moreover, it may haven input dilationand translation weightsa i, di [ R, a i . 0, 1# i # n, andmoutput dilation and translation weightsb j, f j [ R, b j . 0,1 # j#m. Now, in case thatt bipolar coded associations

(x(s),y(s)) [ { ¹ 1,1}n 3 { ¹ 1, 1}m, 1 # s# t (32)

are entered into the network in order to be stored, then thealgorithm which sets

di : ¼ l∑t

s¼ 1x(s)

i , ai : ¼

��1þ d2

i

q, 1 # i # n (33)

fj : ¼ m∑t

s¼ 1y(s)

j , bj : ¼��1þ f 2

j

q, 1 # j # m (34)

wij : ¼∑t

s¼ 1(aix

(s)i ¹ di)(bjy

(s)j ¹ fj), 1 # i # n, 1 # j # m

(35)

with l, m $ 0 free learning parameters, is called generalizedHebb learning scheme.

Remarks. (1) Note that forl ¼ m ¼ 0 ordi ¼ 0 ¼ f j, 1# i #n, 1 # j # m the generalized Hebb learning scheme definedabove reduces to the usual Hebb learning scheme for stan-dard BAMS. (2) In Lenze (1998) a generalized Hebb learn-ing scheme for sigma-pi Hopfield neural networks has beenintroduced which is quite similar. However, in the sigma-pisetting a proper discrete orthogonality argument was in thebackground, which is missing in the BAM context.

With Definition 2.3 we now have a learning scheme forour generalized dilation and translation BAMs, however, atthe moment we don’t have any theoretical arguments why tochoose the parameters in such a way. A first basic theore-tical justification for our choice will be given in the follow-ing theorem, which refers to our initial problem, namely, tominimize Eq. (4), especially in case of highly correlated

patterns. Since a completely general result cannot beexpected because of the nonlinearity of the underlyingminimization problem we focus our attention on a specialcase of extremely correlated information and discuss morearbitrary situations afterwards.

Theorem 2.3. (reorthogonalization theorem). Letn [ N,n $ 3, be given arbitrarily andx (s) [ { ¹ 1,1}n, 1 # s # n,be defined as

x(s)i : ¼

¹ 1, for i ¼ s,

1, for i Þ s,1 # i # n, 1 # s# n

((36)

Then, for

l : ¼n¹ 4

(n¹ 2)��8(n¹ 2)

p (37)

di : ¼ l∑t

s¼ 1x(s)

i , 1 # i # n (38)

ai : ¼

��1þ d2

i

q, 1 # i # n (39)

the following estimate holds∑n

i ¼ 1(aix

(s)i ¹ di)(aix

(r)i ¹ di)

¼

0 for r Þ s, 1 # r,s# n

n2

2(n¹ 2)for r ¼ s, 1 # r,s# n

ð40Þ

8><>:Proof. Because of the very special definition of thexðsÞ [ { ¹ 1;1} n, 1 # s# n, given in Eq. (36) the translationparametersdi, 1 # i # n, immediately reduce to

di ¼ d : ¼l(n¹ 2), 1 # i # n (41)

which imply the unique dilation parameters

ai ¼ a : ¼

��1þ l2(n¹ 2)2

q, 1 # i # n (42)

With the special setting ofl as fixed in Eq. (37) we nowobtain the following two cases for 1# r, s# n.

1. r Þ s:∑ni ¼ 1(aix

(s)i ¹ di)(aix

(r)i ¹ di)

¼ 2(a ¹ d)( ¹ a ¹ d) þ (n¹ 2)(a ¹ d)2

¼ 2(d2 ¹ (��1þ d2

p)2) þ (n¹ 2)(

��1þ d2

p¹ d)2

¼ ¹ 2þ (n¹ 2)

��1þ

(n¹ 4)2

8(n¹ 2)

s¹

n¹ 4��8(n¹ 2)

p0@ 1A2

¼ ¹ 2þ (n¹ 2)�

1þ(n¹ 4)2

8(n¹ 2)¹

2(n¹ 4)8(n¹ 2)

3��8(n¹ 2) þ (n¹ 4)2

pþ

(n¹ 4)2

8(n¹ 2)

�


¼ ¹ 2þ14(4(n¹ 2) þ (n¹ 4)2 ¹ (n¹ 4)n)

¼ 0: ð43Þ

2. r ¼ s:∑ni ¼ 1(aix

(s)i ¹ di)(aix

(r)i ¹ di)

¼ ( ¹ a ¹ d)2 þ (n¹ 1)(a ¹ d)2

¼ ¹

��1þ

(n¹ 4)2

8(n¹ 2)

s¹

n¹ 4��8(n¹ 2)

p0@ 1A2

þ (n¹ 1)

��1þ

(n¹ 4)2

8(n¹ 2)

s¹

n¹ 4��8(n¹ 2)

p0@ 1A2

¼ 1þ(n¹ 4)2

4(n¹ 2)þ

n(n¹ 4)4(n¹ 2)

� �þ (n¹ 1) 1þ

(n¹ 4)2

4(n¹ 2)¹

n(n¹ 4)4(n¹ 2)

� �¼ nþ

n(n¹ 4)2

4(n¹ 2)¹

(n¹ 2)n(n¹ 4)4(n¹ 2)

¼n2

2(n¹ 2)ð44Þ

B

Of course, the above theorem only deals with a veryspecial situation in which even the parameterl can beexplicitly fixed in order to reorthogonalize. In general, how-ever, the situation is much more complicated. Therefore,based on recent results we will now give some moreheuristic motivation for our learning mode definitionsEqs. (33)–(35). First of all, combining Eqs. (15) and (35)the energy of a pair of patterns (x (r), y (r)) [ { ¹ 1,1}n 3{ ¹ 1;1} m, 1 # r # t, can be calculated via

E(x(r), y(r)) ¼ ¹∑n

i ¼ 1

∑mj ¼ 1

∑ts¼ 1(aix

(s)i

¹ di)(bjy(s)j ¹ fj)(aix

(r)i ¹ di)(bjy

(r)j ¹ fj)

¼ ¹∑t

s¼ 1

∑n

i ¼ 1(aix

(s)i ¹ di)(aix

(r)i ¹ di)

!

3∑mj ¼ 1

(bjy(s)j ¹ fj)(bjy

(r)j ¹ fj)

!ð45Þ

For a further discussion, the symmetry of the problemallows us to restrict ourselves to the input space. Here, forgivenx (s) [ { ¹ 1,1}n, 1 # s# t, the translation parameters

di : ¼ l∑t

s¼ 1x(s)

i , 1 # i # n, l . 0 (46)

indicate whether there are more 1 or¹ 1 entries in theithcomponents of all training input vectors. Moreover, we notethat by means of the special coupling of the dilation andtranslation parameters via

ai : ¼

��1þ d2

i

q, 1 # i # n (47)

we always have

(ai ·1¹ di)(ai ·( ¹ 1) ¹ di) ¼ ¹ 1 (48)

In view of the energy contribution of twoith components ofdifferent sign Eq. (48) shows that this contribution is frozento ¹1 as in the classical case (a i ¼ 1, di ¼ 0). This fact isessential because in the following energy analysis we cannow focus our attention to equal-sign-constellations ofithcomponents. For further relevant information, now depend-ing strongly ondi, we take a look at Table 1 and discuss itsinformation in some detail.

For example, we assume that a fixeddi is moderatelygreater than zero, which means that mostith componentsare 1. Therefore, in recall mode a statexi ¼ 1 in theith inputneuron is more likely thanxi ¼ ¹1. Since (a i xi ¹ di) liesbetween 0 and 1 forxi ¼ 1 and is less than¹ 1 for xi ¼ ¹1the BAM’s recall mode reaction Eq. (8) on such a situationis quite reasonable: a rather likely state of a neuron is fedinto the network in a moderately damped way (signalingthat most likely no dramatical changes in the network statesare necessary) while a quite unlikely state is propagatedback into the network in some overemphasized way(increasing the probability of some hopefully properchanges in network activities). In view of the energy con-tribution similar considerations hold. In general, patternswith rare ith components are difficult to store and theirenergy in the sense of Eq. (45) should be diminishedwhile the energies of easy-to-store patterns may bemoderately increased without violating their local minimaproperty. Indeed, fordi . 0 the term (a ixi ¹ di)

2 is less than1 for xi ¼ 1 (likely case; easy-to-store; energy Eq. (45)becomes less negative) while it is greater than 1 forxi ¼

¹1 (rare case; difficult-to-store; energy Eq. (45) becomesmore negative). In a similar way, the casedi , 0 (as alreadyfixed in Table 1) may be discussed.

Of course, the above arguments are only of some heuristicnature, wherel $ 0 (and alsom $ 0) are still free param-eters. However, as we will show in the following section,our strategy really does a quite good job in practice as far asrecent tests show.

Table 1Activation and energy contribution depending ondi (ai ¼

��1þ d2

i

p)

di . 0: 0 , a i·1 ¹ di , 1 (a i·1 ¹ di)2 , 1

¹ ` , a i·( ¹ 1) ¹ di , ¹ 1 (a i·( ¹ 1) ¹ di)2 . 1

di , 0: ` . a i·1 ¹ di . 1 (a i·1 ¹ di)2 . 1

¹ 1 , a i·( ¹ 1) ¹ di , 0 (a i·( ¹ 1) ¹ di)2 , 1


3. Application

In the following, we will underline the power of our newlearning scheme by applying our generalized BAM to sometest problems. The problems consist in learning a number ofdifferent associations of (43 5)-dimension in the input andoutput space (n ¼ m¼ 20) composed of ‘.’ (to be identifiedwith ¹ 1) and ‘X’ (to be identified with 1). The patterns areeither simple realizations of capital and small letters to be

associated with each other or randomly generated. In case ofrandomly generated patterns the ratio of the total number of‘.’ against the total number of ‘X’ can be chosen in order toobtain highly correlated and, therefore, difficult to storesituations. For a number of different values ofl and m

(here, for simplicity, chosen to be equal) the program learnsthe respective weights Eqs. (33)–(35), then successivelystarts the recall mode Eqs. (6)–(9) for all associations tobe learned and, simultaneously, calculates the total error

Table 2First simple tests


produced by the respective BAM on the whole training set.In detail, total error calculation means that after the recallmode has terminated each mismatched component on atraining pair increases the error by one. In this context, itmay be of some interest to remark that the time of conver-gence in recall mode does not depend on the values ofl andm, as recent experiments show. Therefore, the convergencetime is in general always the same as the convergence time forthe classical BAM-setting (l¼m¼0). Now, let us take a look atthree typical program runs summarized in Table 2.

The first and, especially, the second example of Table 2(and, of course, further non-reported experiments) show thatfor a specific set of positive parametersl ¼ m the total recallerror on training sets of (highly) correlated patterns is ingeneral significantly smaller than for the classical choicel ¼ 0 ¼ m. Only in the easy-to-handle situation of uncorre-lated patterns (di < 0 < f j, 1 # i # n, 1 # j # m) there isessentially no difference between choosingl ¼ m identi-cally zero or close to zero. However, such completelyuncorrelated situations are not very typical in practice.

Of course, the above simple examples are only a firstvague hint for the usefulness of the proposed training andrecall scheme. Therefore, we have implemented a moregeneral test suite. For fifty randomly chosen sets of fivehighly correlated pairs of (43 5)-patterns (comp. Example2 of Table 2 for a typical set of such patterns) we havetrained the network for differentl ¼ m. In each of thesefifty runs we have checked the number of properly storedpatterns and have calculated the number of detected stablestates when entering fifty other randomly generated pat-terns. Averaged over all runs the percentage of stored train-ing patterns out of all trained patterns (stored/trained) andthe percentage of stored training patterns out of the totalnumber of stable states (stored/stable) are given in Table 3.

The results of Table 3 show that for moderately increasedl ¼ m an increase of storage is obtained (from 85.2% forl ¼

m ¼ 0 to 92.8% forl ¼ m ¼ 0.08) which is unfortunatelyaccompanied by an increase of spurious states, i.e. stablestates which do not represent any training pattern (spuriousstates increase from about 5.6% forl ¼ m ¼ 0 to 64.8% forl ¼ m ¼ 0.08). At the moment, we do not know whether thisis the case in general or if a more clever and independentchoice ofl andm can increase the capacity without drama-tically increasing the number of spurious states. In any case,when disturbing stored patterns only slightly the spuriousstates do usually not become attractors and the networkreproduces the stored patterns properly (in detail, for 10%mismatched input information regarding each training

pattern the recall fit is almost identical with the recall fiton error-free input information; cf. Table 3, row stored/trained (%)). In other words, the additional spurious statesonly play a role in case of strongly disturbed (i.e. completelyrandom) recall input which, however, is not the usual situa-tion in practice. General experimental studies analyzingthese kind of problems in more detail are started and willbe published elsewhere.

4. Concluding remarks

In this paper, we again considered dilation and translationparameters for neural networks (here BAMs) in order toincrease their flexibility while keeping the number ofweights essentially fixed. In order to find a reasonable strat-egy to fix the free additional parameters we introduced anew learning algorithm for so-called generalized BAMs andstudied its power by considering some concrete applica-tions. The basic conclusion from these (and other) experi-ments was that at least in case of highly correlatedinformation a clever choice of dilation and translationparameters significantly increases the capacity of the net-works while leaving their complexity almost unaffected.However, at the moment it seems to be the case that theincrease of properly stored training sets is in general accom-panied by an increase of spurious states. Further experimen-tal studies are initiated in order to figure out moreappropriate choices forl andm to avoid this and to compareour networks with similar ones investigated in the literature.

Acknowledgements

The author would like to thank two anonymous refereesfor helpful suggestions in order to improve the manuscript.

References

Anderson J.A. (1972). A simple neural-network generating an interactivememory.Math. Biosci., 14, 197–220.

Cohen M.A., & Grossberg S. (1983). Absolute stability of global patternformation and parallel memory storage by competitive neural networks.IEEE Trans. Syst. Man Cyber., 13, 815–826.

Grossberg S. (1968). Some nonlinear networks capable of learning a spatialpattern of arbitrary complexity.Proc. Natl Acad. Sci. USA, 59, 368–372.

Hassoun M.H. (1989). Dynamic heteroassociative neural memories.NeuralNetworks, 2, 275–287.

Table 3Fifty random learning runs with fifty random recalls

l ¼ m: 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.10

Stored/trained (%): 85.2 86.8 89.2 91.2 91.6 91.2 92.8 92.8 92.8 92.4 90.8Stored/stable (%): 94.4 94.3 73.2 58.4 47.6 42.9 39.7 36.7 35.2 35.9 36.0


Hecht-Nielsen, R. (1990).Neurocomputing. Reading, MA: Addison-Wes-ley.

Hopfield J.J. (1982). Neural networks and physical systems with emergentcollective computational abilities.Proc. Natl Acad. Sci. USA, 79,2554–2558.

Kamp, Y., & Hasler, M. (1990).Recursive neural networks for associativememory. Chichester: Wiley.

Kohonen T. (1972). Correlation matrix memories.IEEE Trans. Comput.,21, 353–359.

Kohonen, T. (1984).Self-organization and associative memory. Berlin:Springer.

Kosko B. (1987). Adaptive bidirectional associative memories.Appl.Optics, 26, 4947–4960.

Kosko B. (1988). Bidirectional associative memories.IEEE Trans. Syst.Man Cyber., 18, 49–60.

Lenze, B. (1996). Introducing dilation and translation for incompletesigma-pi neural networks of Hopfield-type. In: F. Fontanella, K. Jetter,& P.-J. Laurent (Eds.),Advanced topics in multivariate approximation(pp. 197–210). Singapore: World Scientific.

Lenze, B. (1998). Linking discrete orthogonality with dilation and transla-tion for incomplete sigma-pi neural networks of Hopfield-type. DiscreteAppl. Math. (in press).

Leung C.S. (1993). Encoding method for bidirectional associative memoryusing projection on convex sets.IEEE Trans. Neural Networks, 4, 879–881.

Leung C. S. (1994). Optimum learning for bidirectional associative

memory in the sense of capacity.IEEE Trans. Syst. Man Cyber., 24,791–796.

McEliece R.J., Posner E.C., Rodemich E.R., & Venkatesh S.S. (1987). Thecapacity of the Hopfield associative memory.IEEE Trans. Inf. Theory,33, 461–482.

Shanmukh, K., & Venkatesh, Y.V. (1993). On an optimal learning schemefor bidirectional associative memories. In:Proceedings of the Inter-national Joint Conference on Neural Networks(Vol. 3, pp. 2670-2673). Nagoya, Piscataway, NJ: IEEE Service Center.

Srinivasan, V., & Chia, C.S. (1991). Improving bidirectional associativeperformance by unlearning. In: Proceedings of the International JointConference on Neural Networks (pp. 2472–2477). Singapore,Piscataway, NJ: IEEE Service Center.

Wang T., Zhuang X., & Xing X. (1994). Designing bidirectional associativememories with optimal stability.IEEE Trans. Syst. Man Cyber., 24,778–790.

Wang Y.-F., Cruz J.B. Jr., & Mulligan J.H. Jr. (1990). Two codingstrategies for bidirectional associative memory.IEEE Trans. NeuralNetworks, 1, 81–92.

Wang Y.-F., Cruz J.B. Jr., & Mulligan J.H. Jr. (1991). Guaranteed recall ofall training pairs for bidirectional associative memory.IEEE Trans.Neural Networks, 2, 559–567.

Zhang B.-L., Xu B.-Z., & Kwong C.-P. (1993). Performance analysis ofthe bidirectional associative memory and an improved model fromthe matched-filtering viewpoint.IEEE Trans. Neural Networks, 4,864–872.


Documents

Complexity preserving increase of the capacity of bidirectional associative memories by dilation and translation