Complexity preserving increase of the capacity of bidirectional associative memories by dilation and translation

  • Published on
    04-Jul-2016

  • View
    212

  • Download
    0

Transcript

  • Contributed article

    Complexity preserving increase of the capacity of bidirectional associativememories by dilation and translation

    Burkhard Lenze*Fachbereich Informatik, Fachhochschule Dortmund, Postfach 105018, D-44047 Dortmund, Germany

    Received 12 September 1997; accepted 17 April 1998

    Abstract

    In this paper, we show how to increase the capacity of Kosko-type bidirectional associative memories by introducing dilation andtranslation parameters in the pattern space. The essential point of this approach is that the increase of the capacity of the networks almostdoesnt affect their complexity. The general idea of the proposed strategy is to tune the networks in such a way that they are better suited tostore patterns which are highly correlated without touching their ability to recover easy-to-store information. A detailed example at the end ofthe paper shows how our approach works in practice. q 1998 Elsevier Science Ltd. All rights reserved.

    Keywords: Dilation; Translation; Bidirectional associative memories; Kosko-type networks

    1. Introduction

    As it is well-known, the standard problem in associativememory design is to store a set of, lets say, t bipolar codedpatterns

    (x(s), y(s)) [ { 1, 1}n 3 { 1, 1}m, 1 # s # t (1)More precisely, the device should be constructed in such away that entering some x (s) (or a slightly disturbed versionof x (s)) results in the proper y (s) as output (cf. Kohonen,1984; Kamp and Hasler, 1990; or Hecht-Nielsen, 1990,Chapter 4, for general information about associative mem-ories resp. networks). In case of linear or even nonlinearunidirectional associative memories or Kosko-type bidirec-tional associative memories (cf. Grossberg, 1968; Ander-son, 1972; Kohonen, 1972; Cohen and Grossberg, 1983;Kosko, 1987; Kosko, 1988; Hassoun, 1989; Wang et al.,1990; Wang et al., 1991; Srinivasan and Chia, 1991;Shanmukh and Venkatesh, 1993; Zhang et al., 1993;Leung, 1993; Leung, 1994 and, finally, Wang et al., 1994)the classical Hebb learning rule

    wij : ts 1

    x(s)i y

    (s)j , 1 # i # n, 1 # j # m (2)

    usually guarantees perfect recall in case that the input

    patterns are pairwise orthogonal,ni 1

    x(s)i x

    (r)i 0, r s, 1 # r, s # t (3)

    Unfortunately, in general this is not the case in practice.Therefore, our idea is to reorthogonalize especially highlycorrelated information by appropriate dilation and transla-tion of the patterns. This idea originally has been introducedin Lenze (1996, 1998) for the Hopfield-type setting (see alsoHopfield, 1982; McEliece et al., 1987; and Kamp andHasler, 1990 for details about Hopfield-type neural net-works). In this context, our precise problem is as follows:find dilation parameters a i [ R, 1 # i # n, and translationparameters di [ R, 1 # i # n, for the input pattern space(and for symmetry later on also for the output patternspace) such that the ratio

    1#r,s#t

    ni 1 (aix(s)i di)(aix(r)i di)t

    s 1

    ni 1

    (aix(s)i di)2(4)

    becomes minimal. In case that the initial training set satis-fies Eq. (3) the optimal choice would be a i 1, di 0,1 # i # n, which would result in 0 for the ratio Eq. (4). Ingeneral, however, Eq. (3) is not fulfilled and, therefore, thestated problem may be interpreted as an attempt to make theinitial training set more orthogonal by appropriate dilation

    * Corresponding author. Tel.: 00 49 231 9112 362; Fax: 00 49 231 9112313; E-mail: lenze@fh-dortmund.de

    0893-6080/98/$19.00 q 1998 Elsevier Science Ltd. All rights reserved.PII: S0893-6080(98)00073-2

    Neural Networks 11 (1998) 10411048

    NN 1224

    PERGAMON

    NeuralNetworks

  • and translation. Obviously, for given input patternsxs[{ 1;1}n, 1 # s # t, the above minimization problemis highly nonlinear and no easy algorithm is known to fix thebest a i,di [ R, 1 # i # n, at once. Moreover, we have totake care that scaling the pattern space can be properlyimplemented in the standard associative memory contextwithout for example destroying such basic featuresas the stable state property in the recursive Kosko-type set-ting. Therefore, we will proceed as follows. In the nextsection, we will show that under a positivity assumptionon the dilation parameters a generalized Kosko-type net-work can be established including the stable state property.In this stage of the discussion, the dilation and translationparameters a i, di [ R, 1 # i # n, are still quite general andnot designed in order to approximately minimize Eq. (4). Asalready noted, the explicit solution of the minimizationproblem is quite difficult and time-consuming. Therefore,we will only develop some easy heuristic strategy to fix thedilation and translation parameters in such a way that Eq. (4)at least becomes smaller compared to the classical choiceai 1 and di 0, 1 # i # n. During this consideration wewill see that we will end up with pairs of parameters whichalready proved to be well-suited in connection with incom-plete sigma-pi neural networks of Hopfield-type (cf. Lenze,1998). At the end of the paper, we will apply our dilationand translation approach to some special examples and willsummarize our basic ideas in a compact form.

    2. Bidirectional associative memories with dilation andtranslation

    A standard bidirectional associative memory in the senseof Kosko (1987, 1988) (in the following shortly calledBAM) consists of a finite number of input units (input neu-rons) which have bidirectional connections to a finitenumber of output units (output neurons). No other neu-ronneuron connections exist. The dynamic states of sucha device are essentially ruled by the following signum-typetransfer function TS:R ! { 1,1,n.d.},

    TS(y) : 1, y , 0,

    n:d:, y 0,

    1, y . 0:

    8>>>: (5)Here, the abbreviation n.d. means that for y 0 the functionTS is not defined. In case that the BAM has precisely n inputunits and m output units the network should be able toassociate a finite number of, lets say, t bipolar coded vec-tors x (s) [ { 1,1}n with their respective counterparts y (s) [{ 1,1}m, 1 # s # t, even in case of slightly disturbed inputinformation (cf. Kamp and Hasler, 1990 or Hecht-Nielsen,1990, p. 103, for a more gentle introduction to BAMS). Inthe following, we will introduce the recall and learningmode for a generalized version of BAMs including dilationand translation of the input and output pattern space.

    Definition 2.1. (recall mode). Let n, m [ N be given arbi-trarily. The generalized BAM may have n input neurons, moutput neurons, and mn connection weights wij [ R,1 # i#n, 1 # j # m. Moreover, it may have n input dilationand translation weights a i, di [ R, a i . 0, 1 # i # n, and moutput dilation and translation weights b j, f j [ R, b j . 0,1 # j#m. We assume that all weights (connection, dilation,translation) are generated by means of any reasonable learn-ing algorithm (for example, the one given in Definition 2.3)and the only assumption on the weights we make is thepositivity of the dilation weights. Now, in case that aninput vector x [0] is entered,

    x[0] (x[0]1 , x[0]2 , , x[0]n ) [ { 1, 1}n (6)and y [1] is defined as

    y[ 1] : ( 1, 1, , 1) [ { 1, 1}m (7)the generalized BAM generates a sequence of vectors(x[u], y[u])u[N0 via

    y[u]j :

    TSni 1

    wij(aix[u]i di !

    forni 1

    wij(aix[u]i di) 0,

    y[u 1]j forni 1

    wij(aix[u]i di) 0,

    8>>>>>>>>>>>>>>>>>:(8)

    for 1 # j # m and

    x[u 1]i :

    TSmj 1

    wij(bjy[u]j fj) !

    formj 1

    wij(bjy[u]j fj) 0,

    x[u]i formj 1

    wij(bjy[u]j fj) 0,

    8>>>>>>>>>>>>>>>>>:(9)

    for 1 # i # n and u [ N0. As the output vector the so-defined recall mode yields the vector y [v] [ { 1,1}m,which for the first time satisfies

    y[v] y[v 1] (10)for some v [ N0.

    Remarks. (1) Note that for a i 1, di 0, 1 # i # n, andbj 1, f j 0, 1 # j # m, the above definition reduces to thestandard BAM recall mode. (2) The above definition impli-citly states that the recall mode terminates at a stable state.We will prove this in the following theorem.

    Theorem 2.2. (stable state theorem). Let n, m [ N be givenarbitrarily and let the generalized BAM be initializedexactly according to Definition 2.1. Now, in case that an

    1042 B. Lenze / Neural Networks 11 (1998) 10411048

  • input vector x [0] is entered,

    x[0] (x[0]1 , x[0]2 , , x[0]n ) [ { 1, 1}n (11)and y [1] is defined as

    y[ 1] : ( 1, 1, , 1) [ { 1, 1}m (12)there exists a natural number v [ N0, such that for thesequence of vectors y [u] [ { 1,1}m, u [ N0, generatedby the generalized BAM recall mode we have

    y[v] y[v u], u [ N0 (13)

    Proof. Let the sequence of vectors

    (x[u], y[u]) [ { 1, 1}n 3 { 1, 1}m, u [ N0 (14)be generated according to Definition 2.1. For two arbitrarybipolar coded vectors (x, y) [ { 1,1}n 3 { 1,1}m, weintroduce the so-called energy functional E,

    E(x, y) : ni 1

    mj 1

    wij(aixi di)(bjyj fj) (15)

    In a first step, we compare the energies E(x [u], y [u]) andE(xu 1, y [u]), where u [ N0 may be given arbitrarily:E(x[u 1], y[u]) E(x[u], y[u])

    ni 1

    mj 1

    wij(bjy[u]j fj)(aix[u 1]i aix[u]i )

    X

    1#i#nx

    [u 1]i x

    [u]i

    ai

    3mj 1

    wij(bjy[u]j fj) !

    x[u 1]i x[u]i

    !

    X1#i#n

    x[u 1]i x[u]i

    2ai

    3

    mj 1 wij(bjy[u]j fj)

    # 0 16

    The two final steps hold because a i . 0 by presupposition,x[u 1]i x

    [u]i since we only considered those neurons in

    the sum which changed their states, and, finally,

    x[u 1]i TSmj 1

    wij(bjy[u]j fj) !

    (17)

    Therefore, we obtain

    E(x[u 1], y[u]) # E(x[u], y[u]) (18)

    and, more precisely,

    E(x[u 1], y[u]) , E(x[u], y[u]) (19)in case that there is at least one index i [ {1,2,,n} withx[u 1]i x

    [u]i (20)

    Going on, we now compare the energies E(x [u1], y [u]) andE(x [u1], y [u1]):E(x[u 1], y[u 1]) E(x[u 1], y[u])

    ni 1

    mj 1

    wij(aix[u 1]i di)(bjy[u 1]j bjy[u]j )

    X

    1#j#my[u 1]j y

    [u]j

    bj

    3ni 1

    wij(aix[u 1]i di) !

    y[u 1]j y[u]j

    !

    X

    1#j#my[u 1]j y

    [u]j

    2bj

    ni 1 wij(aix[u 1]i di)

    #0 21

    Here, the two final steps hold because b j . 0 by presupposi-tion, y[u 1]j y

    [u]j since we only considered those neurons

    in the sum which changed their states, and, finally,

    y[u 1]j TSni 1

    wij(aix[u 1]i di) !

    (22)

    Again, we obtain

    E(x[u 1], y[u 1]) # E(x[u 1], y[u]) (23)and, more precisely,

    E(x[u 1], y[u 1]) , E(x[u 1], y[u]) (24)if there exists at least one index j [ {1,2,,m} withy[u 1]j y

    [u]j (25)

    Summing up, we can conclude that for u [ N0 we alwayshave

    E(x[u 1], y[u 1]) , E(x[u], y[u]) (26)if there exists at least one index i [ {1,2,,n} withx[u 1]i x

    [u]i (27)

    or an index j [ {1,2,,m} withy[u 1]j y

    [u]j (28)

    Since the set { 1,1}n 3 { 1,1}m is finite, theenergy functional E can properly decrease on (x [u], y [u]) [{ 1,1}n 3 { 1,1}m only a finite number of times.

    1043B. Lenze / Neural Networks 11 (1998) 10411048

  • Therefore, there must exist a natural number v [ N0 satis-fying

    E(x[v 1], y[v 1]) E(x[v], y[v]) (29)Using the above arguments this immediately implies

    y[v] y[v 1] (30)resp.

    y[v] y[v u], u [ N0 (31)B

    At the moment, the connection, dilation and translationweights are not yet tuned in order to store any bipolar asso-ciations as given by Eq. (1). This will be done in the follow-ing definition.

    Definition 2.3. (learning mode). Let n, m [ N be givenarbitrarily. The generalized BAM may have n input neurons,m output neurons, and mn connection weights wij [ R,1 # i#n, 1 # j # m. Moreover, it may have n input dilationand translation weights a i, di [ R, a i . 0, 1 # i # n, and moutput dilation and translation weights b j, f j [ R, b j . 0,1 # j#m. Now, in case that t bipolar coded associations(x(s), y(s)) [ { 1, 1}n 3 { 1, 1}m, 1 # s # t (32)are entered into the network in order to be stored, then thealgorithm which sets

    di : lts 1

    x(s)i , ai :

    1 d2i

    q, 1 # i # n (33)

    fj : mts 1

    y(s)j , bj :

    1 f 2j

    q, 1 # j # m (34)

    wij : ts 1

    (aix(s)i di)(bjy(s)j fj), 1 # i # n, 1 # j # m

    (35)with l, m $ 0 free learning parameters, is called generalizedHebb learning scheme.

    Remarks. (1) Note that for l m 0 or di 0 f j, 1 # i #n, 1 # j # m the generalized Hebb learning scheme definedabove reduces to the usual Hebb learning scheme for stan-dard BAMS. (2) In Lenze (1998) a generalized Hebb learn-ing scheme for sigma-pi Hopfield neural networks has beenintroduced which is quite similar. However, in the sigma-pisetting a proper discrete orthogonality argument was in thebackground, which is missing in the BAM context.

    With Definition 2.3 we now have a learning scheme forour generalized dilation and translation BAMs, however, atthe moment we dont have any theoretical arguments why tochoose the parameters in such a way. A first basic theore-tical justification for our choice will be given in the follow-ing theorem, which refers to our initial problem, namely, tominimize Eq. (4), especially in case of highly correlated

    patterns. Since a completely general result cannot beexpected because of the nonlinearity of the underlyingminimization problem we focus our attention on a specialcase of extremely correlated information and discuss morearbitrary situations afterwards.

    Theorem 2.3. (reorthogonalization theorem). Let n [ N,n $ 3, be given arbitrarily and x (s) [ { 1,1}n, 1 # s # n,be defined as

    x(s)i :

    1, for i s,

    1, for i s,1 # i # n, 1 # s # n

    ((36)

    Then, for

    l : n 4

    (n 2) 8(n 2)p (37)di : l

    ts 1

    x(s)i , 1 # i # n (38)

    ai :

    1 d2i

    q, 1 # i # n (39)

    the following estimate holdsni 1

    (aix(s)i di)(aix(r)i di)

    0 for r s, 1 # r, s # n

    n2

    2(n 2) for r s, 1 # r, s # n40

    8>:Proof. Because of the very special definition of thexs [ { 1;1}n, 1 # s # n, given in Eq. (36) the translationparameters di, 1 # i # n, immediately reduce todi d : l(n 2), 1 # i # n (41)which imply the unique dilation parameters

    ai a :

    1 l2(n 2)2

    q, 1 # i # n (42)

    With the special setting of l as fixed in Eq. (37) we nowobtain the following two cases for 1 # r, s # n.

    1. r s:ni 1(aix(s)i di)(aix(r)i di)

    2(a d)( a d) (n 2)(a d)2

    2(d2 (

    1 d2

    p)2) (n 2)(

    1 d2

    p d)2

    2 (n 2)

    1

    (n 4)28(n 2)

    s

    n 48(n 2)p

    0@ 1A2

    2 (n 2)

    1 (n 4)28(n 2)

    2(n 4)8(n 2)

    3

    8(n 2) (n 4)2

    p

    (n 4)28(n 2)

    1044 B. Lenze / Neural Networks 11 (1998) 10411048

  • 2 14

    (4(n 2) (n 4)2 (n 4)n)

    0: 43

    2. r s:n

    i 1(aix(s)i di)(aix(r)i di) ( a d)2 (n 1)(a d)2

    1

    (n 4)28(n 2)

    s

    n 48(n 2)p

    0@ 1A2

    (n 1)

    1

    (n 4)28(n 2)

    s

    n 48(n 2)p

    0@ 1A2

    1 (n 4)24(n 2)

    n(n 4)4(n 2)

    (n 1) 1 (n 4)

    2

    4(n 2) n(n 4)4(n 2)

    n

    n(n 4)24(n 2)

    (n 2)n(n 4)4(n 2)

    n

    2

    2(n 2) 44

    B

    Of course, the above theorem only deals with a veryspecial situation in which even the parameter l can beexplicitly fixed in order to reorthogonalize. In general, how-ever, the situation is much more complicated. Therefore,based on recent results we will now give some moreheuristic motivation for our learning mode definitionsEqs. (33)(35). First of all, combining Eqs. (15) and (35)the energy of a pair of patterns (x (r), y (r)) [ { 1,1}n 3{ 1;1}m, 1 # r # t, can be calculated via

    E(x(r), y(r)) n

    i 1

    mj 1

    ts 1(aix(s)i

    di)(bjy(s)j fj)(aix(r)i di)(bjy(r)j fj)

    ts 1

    ni 1

    (aix(s)i di)(aix(r)i di) !

    3mj 1

    (bjy(s)j fj)(bjy(r)j fj) !

    45

    For a further discussio...