8
Proceedings of International Joint Conference on Neural Networks, Atlanta, Georgia, USA, June 14-19, 2009 Bidirectional Associative Memories, Self-Organizing Maps and k- Winners-Take-All: Uniting Feature Extraction and Topological Principles Sylvain Chartier, Gyslain Giguere, Dominic Langlois and Rana Sioufi Abstract-. In this paper, we introduce a network combining k- Winners-Take-All and Self-Organizing Map principles within a Feature Extracting Bidirectional Associative Memory. When compared with its "strictly winner-take-all" version, the modified model shows increased performance for clustering, by producing a better weight distribution and a lower dispersion level (higher density) for each given category. Moreover, because the model is recurrent, it is able to develop prototype representations strictly from exemplar encounters. Finally, just like any recurrent associative memory, the model keeps its reconstructive memory and noise filtering properties. I. INTRODUCTION I n everyday life, humans are continually required to organize and reorganize perceptual patterns in categories on the basis of new incoming information, in order to act upon the identity and properties of the encountered stimuli. While it is generally agreed that the cognitive system uses some type of feature system to achieve these operations, the idea of a perceptual feature system (e.g. [1]), in which properties would be: 1) defined in a strict iconic fashion, and 2) self-created in major part by associative bottom-up processes (such as those found in connectionist models), is quite recent. Representation-wise, cognitive scientists have argued endlessly over the exclusive use of either generic abstractions (such as prototypes) or very specific perceptual stimulations (such as complete exemplars) to achieve proper category learning and classification. Seemingly, the use of one of these representations is likely to be closely linked to specific environmental demands and system goals [2]. Overall, category formation in a perceptual framework is often seen as a process akin to classic clustering techniques, which involve partitioning stimulus spaces in a number of finite sets, or clusters. In cognitive modeling, Principal Component Analysis (PCA) and Independent Component Analysis (ICA) neural networks have been shown to achieve clustering (see [3] for a review). Recently, [4] have shown S. Chartier is with the School of Psychology, University of Ottawa, 550 Cumberland Rd, Ottawa, ON, KIN6N5, Canada (e-mail: [email protected]) (corresponding author). G. Giguere is with the Departement d'informatique, Universite du Quebec a Montreal, C.P. 8888, Succ. CV, Montreal, QC, H3C 3P8, Canada. (e-mail: [email protected]). D. Langlois is with the School of Psychology, University of Ottawa, 550 Cumberland Rd, Ottawa, ON, KIN6N5, Canada. (e-mail: dlang074@ uottawa.ca). R. Sioufi is with the School of Psychology, University of Ottawa, 550 Cumberland Rd, Ottawa, ON, KIN6N5, Canada. 978-1-4244-3553-1/09/$25.00 ©2009 IEEE that their Feature-Extracting Bidirectional Associative Memory (FEBAM) can lead to the unification of PCAlICA networks and Bidirectional Heteroassociative Memories (BHMs) under a common framework. More precisely, FEBAM, which is based on a Bidirectional Associative Memory (BAM) framework, has been shown to use the equivalent of a nonlinear PCA learning rule. Of course, clustering in neural nets is not limited to PCA networks. In fact, competitive networks (e.g. [5],[6]) constitute local, dynamic versions of clustering algorithms. In these models, each output unit represents a specific cluster. When taking decisions, the association between an exemplar and its determined cluster unit in the output layer is strengthened. In winner-take-all (WTA) networks [5],[6], exemplars may only be associated with one cluster (i.e. only one output unit at a time can be activated). An example of one such hard competitive framework is that of the Adaptive Resonance Theory (ART: [5]). ART networks possess the advantage of being able to deal effectively with the exemplars/prototype scheme, while solving the stability/plasticity dilemma. These unsupervised models achieve the desired behavior through the use of a novelty detector (using vigilance). Various degrees of generalization can be achieved by this procedure: low vigilance parameter values lead to the creation of broad categories, while high values lead to narrow categories, with the network ultimately performing exemplar learning. Another example of this framework is the self-organizing feature map (SOFM: [6]), which, in addition, uses a topological representation of inputs and outputs. Although SOFMs only consider one active output unit at a time, the learning algorithm also allows for physically close neighboring units to update their connection weights. In a SOFM, an exemplar may thus, for instance, be geometrically positioned between two clusters, and possess various degrees of membership. Recently, we have proposed a Bidirectional Associative Memory (BAM) model that was modified to take into account the properties of SOFMs [7]. An extension of the WTA principle selects the k largest outputs from the total n outputs [8]. This k-winners-take-all (kWTA) rule is thus a more general case of the WTA principle, within which exemplars may be associated with many clusters at differing degrees. This procedure provides a more distributed classification. It was shown in [7] that the modified model can create exemplar and/or prototype 503 Authorized licensed use limited to: University of Ottawa. Downloaded on August 26, 2009 at 15:23 from IEEE Xplore. Restrictions apply.

BidirectionalAssociative Memories, Self-Organizing Maps ...aix1.uottawa.ca/~schartie/chartier-ijcnn09.pdf · similar to those found in other network types, such as competitive behavior

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: BidirectionalAssociative Memories, Self-Organizing Maps ...aix1.uottawa.ca/~schartie/chartier-ijcnn09.pdf · similar to those found in other network types, such as competitive behavior

Proceedings of International Joint Conference on Neural Networks, Atlanta, Georgia, USA, June 14-19, 2009

Bidirectional Associative Memories, Self-OrganizingMaps and k-Winners-Take-All: Uniting

Feature Extraction and Topological Principles

Sylvain Chartier, Gyslain Giguere, Dominic Langlois and Rana Sioufi

Abstract-. In this paper, we introduce a network combiningk-Winners-Take-All and Self-Organizing Map principles withina Feature Extracting Bidirectional Associative Memory. Whencompared with its "strictly winner-take-all" version, themodified model shows increased performance for clustering, byproducing a better weight distribution and a lower dispersionlevel (higher density) for each given category. Moreover,because the model is recurrent, it is able to develop prototyperepresentations strictly from exemplar encounters. Finally, justlike any recurrent associative memory, the model keeps itsreconstructive memory and noise filtering properties.

I. INTRODUCTION

In everyday life, humans are continually required toorganize and reorganize perceptual patterns in categories

on the basis of new incoming information, in order to actupon the identity and properties of the encountered stimuli.While it is generally agreed that the cognitive system usessome type of feature system to achieve these operations, theidea of a perceptual feature system (e.g. [1]), in whichproperties would be: 1) defined in a strict iconic fashion, and2) self-created in major part by associative bottom-upprocesses (such as those found in connectionist models), isquite recent. Representation-wise, cognitive scientists haveargued endlessly over the exclusive use of either genericabstractions (such as prototypes) or very specific perceptualstimulations (such as complete exemplars) to achieve propercategory learning and classification. Seemingly, the use ofone of these representations is likely to be closely linked tospecific environmental demands and system goals [2].

Overall, category formation in a perceptual framework isoften seen as a process akin to classic clustering techniques,which involve partitioning stimulus spaces in a number offinite sets, or clusters. In cognitive modeling, PrincipalComponent Analysis (PCA) and Independent ComponentAnalysis (ICA) neural networks have been shown to achieveclustering (see [3] for a review). Recently, [4] have shown

S. Chartier is with the School of Psychology, University of Ottawa, 550Cumberland Rd, Ottawa, ON, KIN6N5, Canada (e-mail:[email protected]) (corresponding author).

G. Giguere is with the Departement d'informatique, Universite duQuebec a Montreal, C.P. 8888, Succ. CV, Montreal, QC, H3C 3P8,Canada. (e-mail: [email protected]).

D. Langlois is with the School of Psychology, University of Ottawa, 550Cumberland Rd, Ottawa, ON, KIN6N5, Canada. (e-mail:dlang074@ uottawa.ca).

R. Sioufi is with the School of Psychology, University of Ottawa, 550Cumberland Rd, Ottawa, ON, KIN6N5, Canada.

978-1-4244-3553-1/09/$25.00 ©2009 IEEE

that their Feature-Extracting Bidirectional AssociativeMemory (FEBAM) can lead to the unification of PCAlICAnetworks and Bidirectional Heteroassociative Memories(BHMs) under a common framework. More precisely,FEBAM, which is based on a Bidirectional AssociativeMemory (BAM) framework, has been shown to use theequivalent of a nonlinear PCA learning rule.

Of course, clustering in neural nets is not limited to PCAnetworks. In fact, competitive networks (e.g. [5],[6])constitute local, dynamic versions of clustering algorithms.In these models, each output unit represents a specificcluster. When taking decisions, the association between anexemplar and its determined cluster unit in the output layer isstrengthened. In winner-take-all (WTA) networks [5],[6],exemplars may only be associated with one cluster (i.e. onlyone output unit at a time can be activated).

An example of one such hard competitive framework isthat of the Adaptive Resonance Theory (ART: [5]). ARTnetworks possess the advantage of being able to dealeffectively with the exemplars/prototype scheme, whilesolving the stability/plasticity dilemma. These unsupervisedmodels achieve the desired behavior through the use of anovelty detector (using vigilance). Various degrees ofgeneralization can be achieved by this procedure: lowvigilance parameter values lead to the creation of broadcategories, while high values lead to narrow categories, withthe network ultimately performing exemplar learning.

Another example of this framework is the self-organizingfeature map (SOFM: [6]), which, in addition, uses atopological representation of inputs and outputs. AlthoughSOFMs only consider one active output unit at a time, thelearning algorithm also allows for physically closeneighboring units to update their connection weights. In aSOFM, an exemplar may thus, for instance, be geometricallypositioned between two clusters, and possess various degreesof membership. Recently, we have proposed a BidirectionalAssociative Memory (BAM) model that was modified totake into account the properties of SOFMs [7].

An extension of the WTA principle selects the k largestoutputs from the total n outputs [8]. This k-winners-take-all(kWTA) rule is thus a more general case of the WTAprinciple, within which exemplars may be associated withmany clusters at differing degrees. This procedure provides amore distributed classification. It was shown in [7] that themodified model can create exemplar and/or prototype

503

Authorized licensed use limited to: University of Ottawa. Downloaded on August 26, 2009 at 15:23 from IEEE Xplore. Restrictions apply.

Page 2: BidirectionalAssociative Memories, Self-Organizing Maps ...aix1.uottawa.ca/~schartie/chartier-ijcnn09.pdf · similar to those found in other network types, such as competitive behavior

In the original SOFM, the output forms an array (grid)where the neurons are geometrically arranged. The choice ofa topological neighborhood h is based on a Gaussianfunction:

This allows for the detection of the winning unit, and of thetopological neighborhood center's position for SOFMs. Thisis equivalent to minimizing the Euclidian distance betweenthe input vector x and the weight vector Wj' If i(x) identifiesthe closest match for x, then the winning unit is determinedby

(categorical) representations, according to the number of kwinning units allowed.

In this study, FEBAM will be further enhanced by usingkWTA and SOFM properties simultaneously. Thismodification will enable the network to increase itsclustering capacity; hence, higher network performance andreadability should follow. In addition, this kWTA-SOFMversion of FEBAM will allow for sparse coding, adistributed representation principle supported byneuropsychological findings [9]. Previous propositions basedon FEBAM will now become special cases of this moregeneral model. Therefore, the goal of the study is to proposea general model based on BAMs that shows propertiessimilar to those found in other network types, such ascompetitive behavior in SOFMs.

i(x) =argm~nllx-wjll, j =1, ..., n}

(6)

II. THEORETICAL BACKGROUND

A. SOFMs and Other General Competitive Models

Competitive models are based on a linear output functiondefined by:

where x is the original input vector, W is the weight matrix,and Yis the output. Weight connection reinforcement followsan Hebbian learning principle with forgetting term' g(Yj)Wj,where g(Yj) must satisfy the following requirement:

(7)

where rj defines the position of neuron j in the output array,r, defines the position of winning neuron i, and a representsneighborhood size. Therefore, the connection between inputx and winning unit i(x) will be maximally reinforced, and thelevel of strengthening for the remaining connections will belower as the distance from the winning unit dj,i(x) increases.The size of the topological neighborhood and the value ofthe learning parameter on trial p are also required to decreasewith time. This is done by using exponential decay:

(1)y=Wx

(2)

Therefore, the learning function can be expressed by(8)

where Wj represents neuron j' s weight vector, 17 represents alearning parameter, and p is the learning trial number. Thespecific choice of function g(Yj) can lead to two differentunsupervised model rules. Equation 3 will lead to Kohonen's[6] competitive learning rule if

(3)

(4)

where ti and 'Z2 represent distinct time constants, and ao and170 respectively represent the initial values for theneighborhood size parameter a and learning parameter 17[24].

An extension of the WTA principle selects the k largestoutputs from the total n outputs [8]. This k-winners-take-all(kWTA) rule is thus a more general case of the WTAprinciple, within which exemplars may be associated withmany clusters at differing degrees. This procedure provides amore distributed classification.

and to a PCA learning rule [11] if

(5)

Hence, both PCA and competitive neural networks constitutespecial cases of Hebbian learning using a forgetting term.

In a WTA perspective, the largest output must be selected.

1 Using a forgetting term prevents the model's weight values fromgrowing indefinitely, which makes the model "explode", such as with strictHebbian learning [10].

III. FEBAM: THEFEATURE-EXTRACTING BIDIRECTIONALASSOCIATIVE MEMORY

Like any artificial neural network, the Feature-ExtractingBidirectional Associative Memory (FEBAM) can be entirelydescribed by its architecture, as well as its output andlearning functions.

A. Architecture

FEBAM's original architecture is illustrated in Figure 1.This architecture is nearly identical to that of the

504Authorized licensed use limited to: University of Ottawa. Downloaded on August 26, 2009 at 15:23 from IEEE Xplore. Restrictions apply.

Page 3: BidirectionalAssociative Memories, Self-Organizing Maps ...aix1.uottawa.ca/~schartie/chartier-ijcnn09.pdf · similar to those found in other network types, such as competitive behavior

Fig. 3. Output function for 0= 0.4.

(9)

(10)

2oActivation

"El~ Ol--------J!'------­o

-0.5

-1 1====:::---'--------2 -1

{

I, If Wx;(t) > 1

Vi, ...,N,y;(t+I)= -1, If Wx;(t) <-1

(o+I)Wx;(t)-o(Wx)i(t), Else

and

{

I, If Vy ;(t) > 1

Vi, ...,M,x;(t+I)= -1, If Vy;(t) <-1

(o+I)Vy;(t)-o(vY)i(t), Else

0.5

where Nand M are the number of units in each layer, i isthe index of the respective vector element, y(t+ 1) and x(t+ 1)

represent network outputs at time t + 1, and 0 is a generalpositive output parameter that should be fixed at a valuelower than 0.5 to assure fixed-point behavior [14],[15].Figure 3 illustrates the shape of the output function wheno= 0.4 . This output function possesses the advantage ofexhibiting continuous-valued (gray-level) attractor behavior(for a detailed example see [12]) . Such properties contrastwith networks using a standard nonlinear output function,which can only exhibit bipolar attractor behavior (e.g. [16]).

Bidirectional Heteroassociative Memory (BHM) modelproposed by [12). It consists of two Hopfield-like neuralnetworks interconnected in head-to-toe fashion [13]. Whenconnected, these networks allow a recurrent flow ofinformation that is processed bidirectionally. As shown inFigure 1, the W layer returns information to the V layer andvice versa, in a kind of "top-down!bottom-up process"fashion. As in a standard BAM, both layers serve as ateacher for the other layer, and the connections are explicitlydepicted in the model (as opposed to Multi-layerPerceptrons). However, to enable a BAM to perform featureextraction, one set of those explicit connections must beremoved.

Y [l ]

X [1]

Fig.l . FEBAM network architecture.

In a standard BAM, the external links given by the initialy(O) connections would also be available to the W layer.Here, the external links are provided solely through theinitial x(O) connections on the V layer. Thus, in contrast withthe standard architecture [12], y(O) is not obtainedexternally, but is instead acquired by iterating once throughthe network as illustrated in Figure 2.

x(O) ~0 ~ y(O)

x(l) . ~/~0 ~ y(l)

C. Learning function

Learning is based on time-difference Hebbian association[10, 12, 14, 17-19], and is formally expressed by thefollowing equations:

W(p+I) = W(k) +'7(y(O)-y(t»(x(O)+ X(t»T (II)

and

V(p + 1) = V(k) + '7(x(O)- x(t»(y(O) + y(t)yr (12)

(13)

Fig. 2. Output iterative process used for learning updates .

In the context of feature extraction, the W layer will beused to extract the features whose dimensionality will bedetermined by the number of y units ; the more units in thatlayer, the less compression will occur. If the y layer'sdimensionality is greater or equal to that of the x layer, thenno compression occurs, and the network should behave as anautoassociative memory [14].

B. Output fun ction

The output function is expressed by the followingequations:

where '7 represents a learning parameter, y(O) and x(O), theinitial patterns at t = 0, y(t) and x(t), the state vectors after titerations through the network, and p the learning trial. Thelearning rule is thus very simple, and can be shown toconstitute a generalization of Hebbianlanti-Hebbiancorrelation in its autoassociative memory version ([ 12],

[14]). For weight convergence to occur, '7 must be setaccording to the following condition [I7J :

11< 1 0*]/'( 2(l-20)Max[N,Mj' 12

Equations 11 and 12 show that the weights can only

505

Authorized licensed use limited to: University of Ottawa. Downloaded on August 26, 2009 at 15:23 from IEEE Xplore. Restrictions apply.

Page 4: BidirectionalAssociative Memories, Self-Organizing Maps ...aix1.uottawa.ca/~schartie/chartier-ijcnn09.pdf · similar to those found in other network types, such as competitive behavior

n ..:....

converge when "feedback" is identical to the initial inputs(that is, yet) = yeO) and x(t) = x(O». The function thereforecorrelates directly with network outputs, instead ofactivations. As a result, the learning rule is dynamicallylinked to the network's output (unlike most BAMs).

To emulate WTA and kWTA processes, the perceptualfeature extraction (or compression) (y) layer's architecturemust be further constrained by the addition of inhibitoryconnections from each "compression" unit to all units fromthat same layer. The W connection matrix will be used tomemorize the mappings between the input and the associatedconstrained representation, while the V connection matrixwill be used to extract constrained versions of the inputpatterns.

y(t)

~ x(t)

Fig. 4. Modified FEBAM architecture for kWTA competitive processesusing a rectangular 2D topology.

As more and more y units become active (i.e. as we turn toa kWTA-type setting), the network evolves towards a moredistributed kind of representation. It can be seen here that theoriginal FEBAM constitutes a special case of a kWTAprocess, where all compression units can be used. Therefore,inhibitory connection values are all set at zero. If thenetwork's architecture is further constrained using arectangular 2D topology, the network can behave like aSOFM (Figure 4) . A SOFM not only categorizes the inputdata, it also recognizes which input patterns are close to eachother in stimulus space. In order to extend FEBAM into aSOFM, the original topological neighborhood function of theSOFM was extended into a "Mexican-hat" function (Figure5) . This function is well suited to bipolar stimuli and isdescribed by :

where rj defines the position of neuron i. r, defines theposition of winning neuron i , and crrepresents neighborhoodsize. The size of the topological neighborhood on trial p isalso required to decrease with time. This is done by usingexponential decay:

(15)

where zrepresents a time constant, and 00 represents theinitial value for the neighborhood size 0".

1.0

0.5

~

~

~ 0.0

<5- 0.5

Distance from winning unitFig. 5. "Mexican-hat" function for p=l , "C= 435, ~=IO.

IV . SIMULATIONS

A. Simulation I: A case study

A first simulation was conducted in order to demonstrate thenetwork's ability to cluster exemplars into categories. Forthis simulation, artificial pixel-based stimuli were created.Stimuli examples are shown in Figure 6.

- -0 ···-0 nw-u ···-0 ...-.•Category I ';;1.· ~1.~] ~=~ . ~1111] ~ tjj

I---------l LLL LLL:±:J LL. . ~.~ .~

~!= ~·'~ii ~i • .-~I .I=t=~~!iCategory 2 f-1-t:\"I-j f- . I:I.:-j f- i i-, f- I.j-~ u }I'.I--------l ~I :ij ~ :ij ~I~. ~I:I. t:± :1]·:·-··1:·-··1:·-"1-·-·· .-.-••i -'I. . .•••• ±f. . " ••• , 1-' •• ,

Ca tegory 3 ~ .ii '11=i ·nu i~-=I I:'~I;II--------l ~••• .J ~•••• t:t•• -LJ tl' ~ ~•.;~

~.~~ ~••• ~ ~·,·I ~...Catego ry 4 ' •• " • -hiI, , •• +, , f- ••-. -, f-!=t' ,.:._ : - .

t ll:..] t • .0 ] b"~ t . =----:J bd ni-:JFig. 6. Examples of patterns used to train the network.

I) Methodology: Four category prototypes wereproduced by generating bipolar-valued vectors, for which thevalue of each vector position (or "feature") followed arandom discrete uniform distribution. The presence of afeature (black pixel) was represented by a value of +I , andthe absence of a feature (white pixel) by a value of - I . Eachprototype vector comprised 36 features. 5 exemplars weregenerated using each prototype, for a total of 20 items. Eachexemplar was created by randomly "flipping" the value ofbetween one to four features. The average intra-categorycorrelation was 0.72 and the average inter-categorycorrelation was 0.08.

Learning followed the following procedure:O. Random weight initializations;1. Random selection of a given exemplar (x(O»;2. Computation of the activation according to

Equation 9.3. Selection of k winners.

506

Authorized licensed use limited to: University of Ottawa. Downloaded on August 26, 2009 at 15:23 from IEEE Xplore. Restrictions apply.

Page 5: BidirectionalAssociative Memories, Self-Organizing Maps ...aix1.uottawa.ca/~schartie/chartier-ijcnn09.pdf · similar to those found in other network types, such as competitive behavior

U IIII

== -.J

a)Fig. 10. Weight connect ions of FEBAM: a) W and b) V.

other hand, SOFM will have about 25t7.1 pixels associatedwith each category (Figure 9a).

••••• ••••••••••••••• •• •••••••••••••••..... ~~. ~~~••••••• a) ~ ..n. . ' . ~ b)

Fig. 9. SOFM : a) 2-D topology and b) corresponding weight connections.

30

40

4. Computation of the initial output (y(O)) accordingto Equation 9 and Equation 14.

5. Computation of the reconstructed input (x(l))according to Equation 10.

6. Computation of the compressed output (y(l))according to Equation 9

7. Weight updates according to Equations 11 and 12;8. Repetition of 1 to 7 until the error (lIy(O) - y( 1)11) is

sufficiently low « 0.1) .For the simulation, the learning (rt) and transmission (£5)

parameters were respectively set to 0.01 and 0.1. Thosevalues respect the requirements described in [12, 17]. Thenumber of y units was set to 100 (I Ox 10 grid), and the initialvalue of (J was accordingly set to 10 (-Y 100, [24D. Finally,the number of winning units was set to 15, a value thatallows for good performance (see Simulation 2).

2) Results: Figure 7 shows the error in function of thenumber of learning trials. In this particular case, after 236learning trials, error was lower than 0.1, and thereforelearning stopped.

2 Of course, there are some other ways to encode the desired number ofexemplars in a more effic ient way. For example a vigilance parameter couldbe used [5).

Fig. II . Converged inputs for the x layer.

Category 4

Category 3

Category 2

Category I

The main difference between the present model andSOFMs is the way weight connections encode theinformation. In SOFMs, the weight connections represent theinput set (Figure 9b), while in FEBAM, the weights are notdirectly interpretable (Figure 10). Prototypes can beextracted if the initial inputs (Figure 6; x(O)) are iterated untilconvergence (x(c)); their invariant states are illustrated inFigure I I. As can be seen, each separate category exemplarreaches the same attractor. The network develops fourattractors, one for each category. The recurrent nature of themodel allows for the development of invariant states. In thiscase, since the variability within each category is quite highand since the number of output units (100) is low, thenetwork will extract the prototype from the exemplars .However , if one is to store specific exemplars, then thenumber of output units must be increased'.

235

200

Nb of learning trialsFig. 7. Error in function of the number of learning trials

150 200Fig. 8. 2-D topology in function of p learning trials .

Figure 8 illustrates to which category each of the weightsfrom the V matrix is associated. Each category is representedby a different color . Each panel represents the "categoricallandscape" after a certain number of learning trials. It can beseen that as the number of learning trials increases, categorysensitivity is adequately represented topologically in thematrix. Moreover, each category occupies approximately25% of the total space. On average (over 20 trials), there willbe 25t4.3 pixels associated with each given category . On the

507

Authorized licensed use limited to: University of Ottawa. Downloaded on August 26, 2009 at 15:23 from IEEE Xplore. Restrictions apply.

Page 6: BidirectionalAssociative Memories, Self-Organizing Maps ...aix1.uottawa.ca/~schartie/chartier-ijcnn09.pdf · similar to those found in other network types, such as competitive behavior

20

1412

15

10

a)

10

Nb.WiJUI~"S

b)

5.0

4.5

§.c; 4 0Ii:~

is 3.5

3.0

2.50

80

60

"0·c;"'- 400..""

20

00 4

1) Results: As shown by Figure l4a, the distancebetween weights (dispersion) decreases as the number ofwinners increases. The graph shows a decrease in variabilityas the number of winning units increase s from I to 10, afterwhich it stabilizes. In other words, the density of the clusterincreases with the number of winning units. Figure l4bshows the maximum proportion of weights used to encode agiven category. The graph shows that the minimumproportion will be close to 25% (theoretical optimality) ataround 10 winners and more. With a minimum of 10winners, the four categories will be encoded using the sameproportion of weights .

From the previous results, if 15 units are allowed to win atthe same time, the network should, on average, show correctperformance. Results indicate that the network will correctlyclass ify a novel pattern 91% of the time. Moreover, in thiscase , only 15 units are activated at the same time, andincreasing that number will not help generalizationperformance . In other words, a sparse representation seemsto achieve better performance than a strict localrepresentation . Moreover, in this case , a distributedrepre sentat ion, like that used the orig inal FEBAM,constitutes a waste of resources, since the kWTA-SOFMversion of the network can achieve the same performancewith only approximately 15 % of the output units.

Nb. WiJUI~"SFig. 14. a) Variability (dispersion) as a function of the number of winningunits. b) Proportion (%) of weights used to encode as a function of thenumber of winning units.

Category I GJGJGJ[;][;]Category 2 ~~~~~Category 3 ~~~~~Category 4 ~~~~~..

Fig. 13. 2-D representat ion of the outputs (y layer) after 15 winning unitshave been put through the "Mexican-hat" function .

Category I CiJ[i [iCategory 2 ~~~~~

Category 3 ~~~~~

Category 4 iJiJiJiJiJ

Fig. 12. 2-D representation of the 15 wmrnng umts In function of eachinput shown in Figure 6.

Figure 12 illustrates the 15 winning units selected infunction of each original input from Figure 6. As shown,these 15 winning units are not spread all over the 2-D map,but rather, they have formed clusters. Within each category,small variability is observed. For example, in category 1,outputs 1 to 3 are exactly the same whereas outputs 4 and 5are exactly the same. However, such small variabilitydisappears once the outputs go through the "Mexican-hat"function (Figure 13). The network then outputs differentpatterns only for different categories.

B. Simulation 2: General properties

We tested the effect of varying the number of winningunits using two measure s: 1) the dispersion index, measuredby calculating the distance between the weights repre sentingeach category (thus quant ifying cluster variability on themap), and 2) the proportion of total weights used to encodeeach category. The number of winning units varied from I to20 for the first test, and from 1 to 15 for the second one.Using each value, the error measure was defined as theproximity between each category's winning units. Todeterm ine the average performance for the network, thesimulation was repeated 50 times, each time with a differentset of patterns.

~i=:I~1 ~i!i~ ~...~ r-llUlJf t-li:il ~ii:tl ~~ ii!11 ~i·l·=I-tll. ~;;·t1=1 ~.. ~ .~ ~ ;; , ,.• ~ I::I- ~=.--~ =iBi ;•• -. . ..•••• J LL.... .J . . ... ..J...J ....... ••• LL...... _ ....

Fig. 15. Set of patterns used for training .

508Authorized licensed use limited to: University of Ottawa. Downloaded on August 26, 2009 at 15:23 from IEEE Xplore. Restrictions apply.

Page 7: BidirectionalAssociative Memories, Self-Organizing Maps ...aix1.uottawa.ca/~schartie/chartier-ijcnn09.pdf · similar to those found in other network types, such as competitive behavior

C. Simulation 3: Alphanumeric letters

Simulation 1 was replicated, but this time using stimuliwith no predetermined categorical (or cluster) membership.The patterns used for the simulations are shown in Figure 15.Each pattern consisted of a 7 x 7 pixel matrix representing aletter of the alphabet. White and black pixels wererespectively assigned corresponding values of -1 and +1.Correlations (in absolute values) between the patterns variedfrom 0.02 to 0.84. Since the network must discriminatebetween a larger number of patterns than in Simulations 1and 2 (10 vs. 4), the output grid was increased to 400 units(20x20). The learning parameter 1] was set to 0.001 and thenumber of winning units was increased to 20 (5% of theoutput units' resources). Learning was conducted using theprocedure described in the previous section. However, in thiscase, the minimum error was set to 0.005. This criterion isessential if proper reconstruction in the x layer is desired.

25

20

1500

Nb of learning trialsFig. 16. Error in function of the number of learning trials

8888888BBBBBDDDDDDDD8888888BBBBBDDDDDDDD8888888BBBBBDDDDDDDD8B88888BBBBBBDDDDDDD888H888BBBBBBDDDD[[[888888COCBBBBDD[[[[[GGGGGCOOOEEEE[[ [[[[ [GGGGGCCOOEEEE[[[[[[1GGGGGCCCCEEEEE[[[[[[GGGGGCCCEEEEEEE[[[[[GGGGGGOOOEE~E~E[[[[ [

GGGGGGCOO~~~~~~~[[[[

GGGGGGCCA EEEEEEEE[[[~OUUUOUA A A AEEEEEE[ 88~OUUUOUA A A A A A AEE8888crouuuau AAAAAAAAE8 8 8 8U~UUUUAAAAAAAAAH88A8UOUUUUAAAAAAAAA8 8 8 8 8UOUUUUAAAAAAAAA8 R8 8 8UOUUUUAAAAAAAAAR8 8 8 8

Fig. 17. 2-D topology for weight connections. Each letter represents thebest matched pattern for each weight.

1) Results (SOFM properties): Figure 16 shows that theweights converged after approximately 1500 learning trials.Figure 17 shows the best matched pattern for each weight inthe V matrix (this is equivalent to a WTA setting). Eachpattern is well-separated into a closed, well-delimited region.

Although the "H" pattern seems to be encoded into differentregions, Figure 18 indicates that only the upper left region isused to encode that particular pattern. In the y layer, it isclear that only a small portion of the 2-D map is used at onetime.

Fig. 18. Ylayer output in function of a given input pattern ; after 20 winningunits have gone through the "Mexican-hat" function .

Fig. 19. Reconstructed patterns .

Inputsx(O)

Outputsx(20)

2) Results (Autoassociative properties): Since the modelis recurrent, it should also preserve its autoassociativememory properties. Figure 19 shows the reconstructedpatterns, or outputs of the x layer, which are invariant statesin the network. Overall, the reconstructed patterns closelymatch the original ones (Figure 15); there are some slightdifferences in the lower right region of letters E and G.

Because the model is recurrent, it can filter noise andcomplete patterns just like any recurrent associative memory(ex. [20]). Figure 20 illustrates such properties through anoisy recall task, where different examples of patterns arecontaminated with various types of noise (random pixelsflipped, addition of normally distributed random noise,specific portion of a given pattern removed).

V. DISCUSS10N

FEBAM is able to reproduce properties of SOFMs whilekeeping the properties of a recurrent associative memory.Being able to introduce a topological mapping and k-winner ­take-all properties simultaneously into a BAM-likearchitecture represents a scientific achievement by itself. The

509

Authorized licensed use limited to: University of Ottawa. Downloaded on August 26, 2009 at 15:23 from IEEE Xplore. Restrictions apply.

Page 8: BidirectionalAssociative Memories, Self-Organizing Maps ...aix1.uottawa.ca/~schartie/chartier-ijcnn09.pdf · similar to those found in other network types, such as competitive behavior

recurrent architecture allows for the simultaneousdevelopment of prototypes in one layer, and of clusters in theother layer. This new property adds to the many otherBHMIFEBAM framework properties, such as aperiodicrecall, many-to-one association, multi-step patternrecognition, blind source separation, perceptual featureextraction, input compression and signal separation ([4],[12], [17], [21], [22]). In short, the network replicatesvarious behaviors shown by recurrent associative memories,principal component analysis and self-organizing maps whilekeeping the same learning and output functions, as well as itsgeneral bidirectional heteroassociative architecture. Thisresult constitutes a step toward unifying various classeswithin a general architecture.

Parameters are constrained in function of thedimensionality of the inputs (learning parameter, 1/), theanalytic results (transmission parameter, 0), the size of thetopology (neighborhood size of the "Mexican-hat", oJ and itsdecreases (time constant, 'Z). However, this time-dependentfunction limits the model's ability to adapt to new patternsthat are not part of the original set of patterns. This problemis not unique to this model but to the majority of SOFMs [5].One solution would be the inclusion of a vigilance procedure[5], similar to that of distributed associative memories [23].Further studies should also explore how basins of attractioncan be optimized in function of the number of categories tobe developed, and within-category correlations. In addition,quantitative comparison should be made between the modeland its BAM version.

Also, since, in a neuropsychological framework, thepresence of local connections is more probable than that ofdistal connections, specific further investigations shouldfocus on the interaction between kWTA and SOFMs withinthe FEBAM framework, concentrating more specifically onthe role of the topological neighborhood, the number ofactive units and the encoding representation. Finally, furtherstudies will also be focused on the extent to which thisframework can reproduce other kinds of human learningbehaviors.

VI. REFERENCES

[1] P. G. Schyns, R. L. Goldstone, and J. P. Thibaut, "The Developmentof Features in Object Concepts," Brain and Behavioral Sciences, pp.1-54,1998.

[2] G. L. Murphy, The Big Book of Concepts. Cambridge, MA: MITPress, 2002.

[3] K. I. Diamantaras and S. Y. Kung, Principal Component NeuralNetworks. New York: Wiley, 1996.

[4] S. Chartier, G. Giguere, P. Renaud, R. Proulx, and J.-M. Lina,"FEBAM: A feature-extracting bidirectional associative memory,"Proceeding of the International Joint Conference on NeuralNetworks (IJCNN'07), pp. 1679-1684, August 2007.

[5] S. Grossberg, "Nonlinear Neural Networks: Principles, Mechanisms,and Architectures," Neural Networks, vol. 1, pp. 17-61, 1988.

[6] T. Kohonen, "Self-organized formation of topologically correctfeature maps," Biological Cybernetics, vol. 43, pp. 59-69, 1982.

[7] S. Chartier and G. Giguere, "Autonomous Perceptual FeatureExtraction in a Topology-Constrained Architecture," in Proceedingsof the 30th Annual Conference of the Cognitive Science Society, B.C. Love, K. McRae, and V. M. Sloutsky, Eds. Austin, TX: CognitiveScience Society, 2008, pp. 1868-1873.

[8] E. Majani, R. Erlanson, and Y. Abu-Mostafa, "On the k-winners-take­all network" in Advances in Neural Information Processing SystemsI, D. S. Touretzky, Ed. San Mateo, CA: Morgan Kaufmann, 1989, pp.634-642.

[9] B. A. Olshausen and D. J. Field, "Sparse coding of sensory inputs,Current Opinions in Neurobiology," Current Opinions inNeurobiology, vol. 14, pp. 481-487, 2004.

[10] R. S. Sutton, "Learning to predict by the methods of temporaldifference," Machine learning, vol. 3, pp. 9-44, 1988.

[11] E. Oja, "A simplified neuron model as a principal componentsanalyzer," Journal ofMathematical Biology, vol. 15, pp. 267-273,1982.

[12] S. Chartier and M. Boukadoum, "A bidirectional heteroassociativememory for binary and grey-level patterns," IEEE Transactions onNeural Networks, vol. 17, pp. 385-396, 2006.

[13] M. H. Hassoun, "Dynamic Heteroassociative Neural Memories,"Neural Networks, vol. 2, pp. 275-287,1989.

[14] S. Chartier and R. Proulx, "NDRAM: Nonlinear Dynamic RecurrentAssociative Memory for Learning Bipolar and Nonbipolar CorrelatedPatterns," IEEE Transactions on Neural Networks, vol. 16, pp. 1393­1400,2005.

[15] S. Helie, "Energy minimization in the nonlinear dynamic recurrentassociative memory," Neural Networks, vol. 21, pp. 1041-1044,2008.

[16] B. Kosko, "Bidirectional Associative Memories," IEEE Transactionson Systems, Man and Cybernetics vol. 18, pp. 49-60, 1988.

[17] S. Chartier, P. Renaud, and M. Boukadoum, "A nonlinear dynamicartificial neural network model of memory," New Ideas inPsychology, vol. 26, pp. 252-277, 2008.

[18] B. Kosko, "Unsupervised learning in noise," IEEE Transactions onNeural Networks, vol. 1, pp. 44-57, 1990.

[19] E. Oja, "Neural Networks, Principal Components, and Subspaces,"International Journal ofNeural Systems, vol. 1, pp. 61-68,1989.

[20] J. J. Hopfield, "Neural Networks and Physical Systems with EmergentCollective Computational Abilities," Proceedings of the NationalAcademy ofSciences, vol. 79, pp. 2554-2558, 1982.

[21] G. Giguere, S. Chartier, R. Proulx, and J.-M. Lina, "Categorydevelopment and reorganization using a bidirectional associativememory-inspired architecture," in Proceedings of the 8thInternational Conference on Cognitive Modeling, R. L. Lewis, T. A.Polk, and J. E. Laird, Eds. Ann Arbor, MI: University of Michigan,2007, pp. 97-102.

[22] G. Giguere, S. Chartier, R. Proulx, and J.-M. Lina, "Creatingperceptual features using a BAM -inspired architecture," inProceedings of the 29th Annual Conference of the Cognitive ScienceSociety, D. S. McNamara and J. G. Trafton, Eds. Austin, TX:Cognitive Science Society, 2007, pp. 1025-1030.

[23] S. Chartier and R. Proulx, "A Self-scaling Procedure in UnsupervisedCorrelational Neural Networks," Proceeding of the InternationalJoint Conference on Neural Networks (IJCNN'99), vol. 2, pp. 1092­1096, 1999.

[24] S. Haykin, Neural Networks: A Comprehensive Foundation.Englewood Cliffs, NJ: Prentice-Hall, 1999 ..

510

Authorized licensed use limited to: University of Ottawa. Downloaded on August 26, 2009 at 15:23 from IEEE Xplore. Restrictions apply.