Learning sequences of sparse correlated patterns using small-world attractor neural networks: An application to traffic videos

Neurocomputing 74 (2011) 2361–2367

Contents lists available at ScienceDirect

Neurocomputing

0925-23

doi:10.1

� Corr

E-m

journal homepage: www.elsevier.com/locate/neucom

Learning sequences of sparse correlated patterns using small-world attractorneural networks: An application to traffic videos

Mario Gonzalez a, David Dominguez a,�, Angel Sanchez b

a EPS, Universidad Autonoma de Madrid, 28049 Madrid, Spainb DCC-ETSII, Universidad Rey Juan Carlos, 28933 Madrid, Spain

a r t i c l e i n f o

Article history:

Received 29 July 2010

Received in revised form

22 December 2010

Accepted 12 March 2011

Communicated by G. Palmhas been studied. Our approach has been successfully tested on a complex pattern, as it is the case of

Available online 13 April 2011

Keywords:

Attractor network

Small-world

Sparse-coding

Correlated patterns

Temporal sequence

Video retrieval

Traffic analysis

12/$ - see front matter & 2011 Elsevier B.V. A

016/j.neucom.2011.03.014

esponding author.

ail address: [email protected] (D. Dom

a b s t r a c t

The goal of this work is to learn and retrieve a sequence of highly correlated patterns using a Hopfield-

type of attractor neural network (ANN) with a small-world connectivity distribution. For this model,

we propose a weight learning heuristic which combines the pseudo-inverse approach with a row-

shifting schema. The influence of the ratio of random connectivity on retrieval quality and learning time

traffic video sequences, for different combinations of the involved parameters. Moreover, it has

demonstrated to be robust with respect to highly variable frame activity.

& 2011 Elsevier B.V. All rights reserved.

1. Introduction

Video analysis involves processing information from sequencesof digital images which are highly correlated in time. In some cases,the video sequences are captured with a single camera, and itsanalysis exploits the temporal correlation from one frame to thenext one in the sequence. In other situations, the sequences areobtained from several cameras, and the processing may involvereconstructing three-dimensional scenes from two-dimensionalsequences captured by each camera. Many applications involvingvideo analysis have been presented in domains like surveillance,manufacturing, video games, among others [4,12].

The application of video-based analysis to traffic surveillance[6,19] is an area of growing interest with the aim to detect bothglobal events (i.e. number of vehicles in a road region) and localevents (i.e. detection and tracking of a specific vehicle). As largeamounts of video data are stored for analyzing the involvedevents on them, it becomes very important to develop efficientstorage and retrieval techniques for these traffic videos. Ingeneral, these videos are sequences of frames where the involvedpatterns (i.e. moving vehicles) are highly correlated in time,specially in traffic congestion scenes. Most of existing works for

ll rights reserved.

inguez).

this problem use an approach based on scene segmentationfollowed by vehicle tracking [5]. In it, the vehicles are firstdetected in the dynamic scene using adaptive-background tech-niques [18,19] and specific features like texture, color or shape[3], are extracted from the segmented targets for classification.Later, these vehicles are tracked using different techniques likeoptical flow [23], Kalman filters [16] or particle filters [33], amongothers. Segmentation and tracking tasks become more difficult onrealistic traffic situations like possible vehicle congestions, varia-bility of weather and/or illumination conditions. Moreover, thevehicle tracking results along time are highly dependent on agood segmentation of them. To avoid the need of segmentationand tracking individual vehicles, some holistic representations forthe storage and retrieval of traffic videos have been proposed.Chan and Vasconcelos [5] propose a dynamic texture representa-tion to model the motion flow in the scene. They use theKullback–Leibler divergence and the Martin distance to retrieveand classify traffic videos without need of segmentation. Xie et al.[38] present another holistic method for traffic video retrievalusing hierarchical self-organizing maps (HSOM). They extract themotion trajectories of the vehicles present in the video and theseactivity patterns are stored by the neural network, later thislearned knowledge is combined with a semantic indexing stage toretrieve traffic sequences based on queries by keywords.

The aim of our work is to learn and retrieve a sequence ofpatterns that are highly correlated over time, obtained from a traffic

www.elsevier.com/locate/neucom

dx.doi.org/10.1016/j.neucom.2011.03.014

mailto:[email protected]

dx.doi.org/10.1016/j.neucom.2011.03.014

M. Gonzalez et al. / Neurocomputing 74 (2011) 2361–23672362

video sequence. We use a Hopfield-type of attractor neural network(ANN) with a small-world connectivity distribution. It is known that,for uniformly distributed (i.e. non-correlated) patterns, the mostefficient arrangement for storage and retrieval of patterns as a whole(global information) by an ANN is the random network. However,small-world networks with a moderate number of shortcuts can bealmost as computationally efficient as a random network whilesaving considerably on wiring costs [2,26,28,24]. Furthermore, fornon-uniformly distributed patterns, networks with spatially distrib-uted synapsis are more efficient [20].

In order to achieve this objective one must face some typicalproblems found in the literature on ANN [10,22,32]. First, in real-world applications, such as video compression/retrieval, wherepatterns present high correlation, one has to deal with sparsecoding patterns. Sparse-coding [31] is the representation of itemsby the strong activation of a relatively small set of neurons [30].This is a different subset of all available neurons when thepatterns are uncorrelated. On the one hand, this sparse-codinggives the model a biological plausibility since the brain suggests ageneral sparse-coding strategy [17,34]. This is physiologicallyrelevant, because the amount of energy the brain needs to useto sustain its function decreases with increasing sparseness [11].Sparse-coding is also favorable to increase the network capacity,because the cross-talk term between stored patterns decreases[25]. On the other hand, it is difficult to sustain a low rate ofactivity in ANN and a control mechanism must be used [8].

Second, learning a sequence of time-correlated patterns isrequired by our application. The noise induced by the overlapbetween patterns is much higher for correlated patterns than forrandom patterns [15]. This implies that the network capacitydrops down to an asymptotically vanishing value. Correlationsbetween the training patterns, as it happens for a video sequence,worsens the performance of the network since the cross-talk termcan yield high values in this case [37].

The contribution of this paper is twofold. First, we introduce avariant of the pseudo-inverse approach to learn/retrieve a sequenceof correlated cyclic patterns (as it is the case of a video sequence)using a sparse-coding ANN with a small-world topology. Second,to demonstrate the feasibility of our approach for the storage andretrieval of traffic videos. The rest of the paper is organized asfollows. A general solution to the problem based on the pseudo-inverse approach is detailed in Section 2. The proposed model forthis problem avoids the segmentation and tracking of the involvedtargets and also some closely related difficulties. Section 3 presentsthe experimental framework for two complex traffic videos: in thefirst one many vehicles appear in the scene of a Kiev crossroad andthe second video shows a roundabout with light traffic. Results arepresented and analyzed for different parameter settings. Finally,Section 4 concludes the paper.

Fig. 1. An schematic representation of a small-world topology (Watts–Strogatz

model) with N¼16, K¼4 and o¼ 0:0 (left), o¼ 0:05 (middle) and o¼ 1:0 (right).

2. Proposed model

This section introduces the topology and dynamics of theproposed ANN model where a variant of pseudo-inverse is usedto compute the learning weights. The information measures usedto determine the network performance, and the proposed thresh-old strategy to retrieve patterns with a low activity, are alsodescribed.

2.1. Neural coding

We consider a network with N neurons and a fixed number ofKoN synaptic connections per neuron. At any given discrete timet, the network state is defined by the set of N independent binaryneurons ~tt

¼ ftti A ½0,1�; i¼ 0, . . . ,N�1g, each one active or inactive

denoted respectively by the state 1 or 0. The aim of the network isto retrieve a sequence of correlated patterns (in our case, theconsecutive frames of the video sequence) f~Zm,m¼ 1, . . . ,Pg thathave been stored during a learning process. Each pattern~Zm ¼ fZmi A ½0,1�; i¼ 1, . . . ,Ng is a set of biased binary variableswith sparseness probability:

pðZmi ¼ 1Þ ¼ am, pðZmi ¼ 0Þ ¼ 1�am: ð1Þ

The mean activity for each pattern m is am ¼PN

i Zmi =N�/ZmS.

The neural activity for any time t is given by the mean:qt ¼

PNi tt

i=N�/ttS.

2.2. Network topology

The synaptic couplings between the neurons i and j are givenby the adjacency matrix Jij � CijWij, where the topology matrixC¼ fCijA ½0,1�g describes the connection structure of the neuralnetwork and W¼ fWijg is the matrix of learning weights. Thetopology matrix contains two types of links: the local and therandom ones, respectively. The local links connect each neuron toits Kl nearest neighbors in a closed ring, while the random linksconnect each neuron to Kr others uniformly distributed in thenetwork. Hence, the network degree is K ¼ KlþKr . The networktopology is then characterized by two parameters, the connectivity

ratio g [14] and the randomness ratio o, which are respectivelydefined by

g¼ K=N, o¼ Kr=K , ð2Þ

where o plays the role of a rewiring probability in the small-world

model [9,35,39]. Fig. 1 shows a topology example of theconsidered ANN.

The storage cost of this network is jJj ¼N � K if the matrix J isimplemented as an adjacency list, where all neurons have K

neighbors.

2.3. Retrieval dynamics

The task of the network is to retrieve the whole learnedsequence of patterns (i.e., the full video sequence) starting froman initial neuron state ~t0 which is a given seed frame or a stateclose to it. The retrieval is achieved through the noiseless neurondynamics:

ttþ1i ¼Yðht

i�yti Þ, ð3Þ

hti �

1

K

Xj

Jij

ttj�qt

jffiffiffiffiffiffiQt

j

q , i¼ 1, . . . ,N, ð4Þ

where hti denotes the local field at neuron i and time t, and yi is its

firing threshold. The local mean neural activity is qti ¼/ttSi, and

its variance is Qti ¼ VarðttÞi. The local mean is given by spatial

averaging: /f tSi �P

jCijftj =N¼

PkACi

f tk=K , for any given function f

M. Gonzalez et al. / Neurocomputing 74 (2011) 2361–2367 2363

of the neuron sites. Here we used the step function:

YðxÞ ¼1, xZ0,

0, xo0:

(ð5Þ

For convenience, we use in the paper some normalized vari-ables, where the site and time dependence are implicit:

s� t�qffiffiffiffiQ

p , q�/tS, Q � VarðtÞ ¼ qð1�qÞ, ð6Þ

x�Z�affiffiffi

Ap , a�/ZS, A� VarðZÞ ¼ að1�aÞ, ð7Þ

where a and q are the pattern and neural activities, respectively.The averages computed in this work run over different ensembles,and are indicated in each case. These variables can be directlytranslated to those used in most works found in the literature foruniform (non-biased) neurons [15], in the case of a¼1/2.

2.4. Learning dynamics

To state the proposed learning rule for storing cyclic patternswhich are highly correlated, as it is the case of a video sequence,we will recall the expression of the weights for the standard case(static and uncorrelated patterns), and then two straightforwardextensions: static and correlated patterns, and cyclic and uncor-related patterns. Cyclic patterns correspond to sequences ofpatterns of variable activities, with periodic conditions [27], thatmeans, the next to the last pattern is the first one, then xmþP

¼ xm.If the network learns a set P¼ aK of static and uncorrelated

patterns, /xmxnS¼ 0, these are stored by the network couplingsWij using the classical Hebbian rule [1] for the Hopfield model:

Wij ¼1

N

XP

m ¼ 1

xmi xmj : ð8Þ

This rule for learning the weights can be generalized introducinga P� P matrix Amn in the following way:

Wij ¼1

N

XP

m,n ¼ 1

xmi Amnxnj : ð9Þ

The standard case, given by Eq. (8), is obtained by using anidentity matrix AI

mn ¼ dmn.For the situation of learning static and correlated patterns, the

pseudo-inverse approach [15,36] is a standard method to ortho-gonalize (i.e. to extract) the correlated patterns, and the matrixAmn is computed as follows:

ACmn ¼O�1

mn , Omn �1

N

XN

i

xmi xni , ð10Þ

where O is the P� P patterns overlap matrix.For the case of learning cyclic (sequential with periodic

conditions) and uncorrelated patterns the former Hebbian rule,Eq. (8), combined with a row-shifting schema of the identitymatrix can be applied [1]:

ASmþ1,n ¼ dmn, 8mA ½1, . . . ,P�1�, A1,n ¼ dP,n 8nA ½1, . . . ,P�: ð11Þ

In the case of video sequences, we have cycles (or sequences ofpatterns) where there is a high temporal correlation between thesuccessive frames. For this reason, we propose a heuristic wherethe learning weights are computed by combining the pseudo-inverse approach with a row-shifting schema, as the one used forcyclic patterns. The proposed heuristic for this case (i.e. cycles andcorrelated patterns) has the following four steps:

1.
Obtain the pattern overlap matrix O. 2. Compute its inverse matrix O�1.
3.
Rotate forwards cyclically the rows of O�1 to obtain a newmatrix M.
4.
Substitute matrix A by the new matrix M in Eq. (9) to computethe weights matrix W for the video sequence to be learned.
The previous stages are detailed next. First, the P� P overlapmatrix O, describing the video sequence is computed by Eq. (10),and its inverse matrix O�1 is obtained next. This approach isthought to get fixed point solutions. However, if one is seeking alimit cycle solution (i.e. retrieving the whole sequence of framescyclically), then one must benefit from the interactions betweenone frame and the next one in the video. Therefore, the elementsof the O�1 matrix are shifted as shown schematically in thefollowing equations:

AVmþ1,n ¼O�1

mn , mA ½1, . . . ,P�1�, AV1,n ¼O�1

P,n, 8nA ½1, . . . ,P�, ð12Þ

obtaining the matrix AV. The previous rule takes into account thedominant terms in the infra-diagonal positions of the matrix AV.The sub-dominant terms account for the orthogonalization of thematrix O�1. It is worth to note that the pseudo-inverse rule is anot local matrix, because the connections between every twoneurons depend on the other neurons; it is also a non-iterativerule, all patterns must be learned at the beginning of the retrievalprocess.

The learned weight matrix W is now calculated according tothe rule in Eq. (9), where Amn is computed by applying the row-shifting schema given by Eq. (12). The learning stage displaysslow dynamics, being stationary within the time scale of thefaster retrieval stage, as shown by Eq. (3). A stochastic macro-dynamics takes place due to the extensive learning of P¼ aK

patterns, where a is the load ratio.

2.5. Threshold strategies

In order to retrieve patterns with low activity, it is necessary touse an adequate threshold of firing. If firing is not controlled, theneural activity could be higher (lower) than the pattern activity,whenever the threshold is too small (large).

The more sparse the code is, the more sensitive is the intervalwhere the threshold can move [8]. On the one hand, one could usean optimal manually chosen threshold, where for each learnedpattern and initial condition, the retrieval is maximized. This isnot a realistic strategy, since the neural network is not supposedto know the patterns during the retrieval process. Thus, a simpleand convenient solution is to use a fixed value for the threshold.The value of yi ¼ 1 for the threshold was obtained experimentallyfor a sparseness ratio of a� 0:1, which is the mean sparseness ofthe frames in the analyzed videos.

2.6. The information measures

In order to evaluate the network retrieval performance, twomeasures are considered: the global overlap and the load ratio.The overlap is used as a temporal measure of information, whichis adequate to describe instantaneously the network ability toretrieve each frame of the video. In this case, the overlap mt

mbetween the neural state st at time t and the frame xm is:

mtm �

1

N

XN

i

xmi sti , ð13Þ

which is the normalized statistical correlation between thelearned frame Zmi and the neural state tt

i at a given iteration t inthe sequence cycle. One lets the network evolve according toEqs. (3) and (4) and measures the overlap between the networkstates and the video frames running over a whole sequence cycle


of the learned video. The neural states f~tt , t¼ 1, . . . ,Pg arecompared cyclically with the learned frames f~Zm,m¼ 1, . . . ,Pg.The network starts in an initial condition close to a given frame,say tt ¼ 1 � Zm ¼ 1, so that the time and frame label are synchro-nized, and the overlap for each frame at cycle c¼ 0,1,2 . . . is

mcm �

1

N

XN

i

xmi smþ cPi : ð14Þ

The global overlap is defined as

mc ¼/mcmS�

1

P

XP

m ¼ 1

mcm ð15Þ

and it measures the network ability to retrieve the wholesequence of patterns. After a transient period of time, the networkdynamics converges to a stationary regime where the globaloverlap mc does not change in the next cycles. When this globaloverlap between the whole set of patterns (i.e. the videosequence) and their corresponding neural states is m¼1, thenetwork has retrieved the complete sequence without noise.In this case, all the network states correspond perfectly to theframes of the video. When the global overlap m is zero, thenetwork carries no macroscopic order. In this case, the videocannot be retrieved. For intermediates values of m, where0omo1, the video can be partially recovered with a given levelof noise (when m increases, a higher number of frames can beperfectly retrieved).

Besides the overlap, we are also interested in the load ratioa� P=K , that accounts for the storage capacity of the network.This ratio depends on the size of the video, which is P�N (i.e. thenumber of frames by their spatial resolution, where this resolu-tion coincides with the number of neurons), and the amount ofphysical memory necessary to store the video, which is K�N

representing the adjacency lists sizes (see the network topologysubsection).

When the number of stored patterns increases, the noise dueto interference between patterns also increases and the networkis not able to retrieve them. Thus, the overlap m goes to zero. Agood trade-off between a negligible noise (i.e. when 1�m� 0) anda large video sequence (i.e. a high value of a) is desirable for anypractical-purpose model.

3. Experimental evaluation

The learning times to store our traffic video sequences were veryhigh for the network considered. In our experiments, this time washighly dependent on the parameter K, as well as the number oflearned patterns P, and it varies between 100 min and near2000 min depending on the network degree considered. In fact thelearning time is of order OðN � K � P2Þ, according to Eq. (12). That iswhy we have only used two video sequences for our experiments:the first one, Kiev, corresponds to a densely transited crossroad zonein Kiev, Ukraine; and the second one, roundabout, corresponds to aroundabout area in a Spanish city. Different model parameterconfigurations were tested for both sequences to get more insighton how the network behaved during the learning and retrieval ofcorrelated cyclic frames. The Kiev video sequence was captured by alive camera demo site from Axis company: http://www.axis.com/es/solutions/video/gallery.htm.

It was recorded by an Axis Q1755 Network Camera as an AVIvideo and consisted of 1835 frames at 25 frames per second, thatis 73.4 s of recording. The original roundabout video sequenceconsisted of about 15 min of AVI video which was recorded with aconventional camera at 30 fps with frames and we used only 650frames, that is 21.7 s of video for our experiments.

For the two analyzed sequences, the video pre-processingincluded three stages:

(1)
The frames of the initial color video sequence were convertedinto binary patterns and stored as PNG images with dimen-sion 384�356 black-and-white pixels for the Kiev sequenceand 640�480 pixels for the roundabout sequence.
(2)
The Kiev frames were resized to 96�89¼8544 pixels and theroundabout frames to 80�106 pixels, in order to get areasonable network size for the simulations.
(3)
A new subsequence of frames was created by uniformly sub-sampling the sequence obtained in the previous stage using anatural factor f, where f Z1 (i.e., we build the video subse-quence with original frames: 1, 1þ f, 1þ2f,y). The goal is toensure that the network is able to recover the whole storedsequence of frames. Consequently, we start testing with f¼1,then f¼2, and so on, until the condition holds.
For the simulations we have used a system with an Intel Core2 Duo CPU E6750 at 2.66 GHz and with 2 GB of physical memory.The Octave image package [29] was used for processing the imagefiles into text files with the 0/1 binary format as the neuron statesrequired. The network parameters used in the Kiev simulationswere N¼ 8544,K ¼ 4250,yi ¼ 1:0 for a sparseness a¼0.10. For thisnetwork size, it has been recovered the video sequence each f¼5frames, that is: 1835

5 ¼ 367 frames. For the roundabout simulationsa similar network were used with N¼8480, K¼4240, yi ¼ 1:0 for asparseness a¼0.07, recovering the video sequence each f¼5frames, that is: 650

5 ¼ 130 frames. The video output comparingthe original with the retrieved frames and the frames in textformat can be found at: http://dl.dropbox.com/u/11890025/video5.zip for the Kiev sequence and at: http://dl.dropbox.com/u/11890025/roundabout.zip for the roundabout sequence.

Figs. 2 and 3 show some sample post-processed frames of thestored and successfully retrieved video sequences for the Kiev androundabout sequences, respectively. In Fig. 2 the seed used to startthe retrieval was a noisy frame (top-left panel), with initialoverlap mc ¼ 0

m ¼ 1 ¼ 0:5. During the first cycle, the network is correct-ing the wrong pixels, (frame numbers 1, 21, 41 and 61 arepresented in the top panels) mc ¼ 0 � 0:93, see Fig. 4. After acomplete cycle the overlap reaches the stationary value ofmc ¼ 1 � 0:99 (the same frames are shown in the bottom panelsfor the second cycle).

For the roundabout sequence in Fig. 3 the seed was a noisyframe (top-left panel), with initial overlap mc ¼ 0

m ¼ 1 ¼ 0:4. The framenumbers 1, 11, 21 and 31 are presented in the top panels formc ¼ 0 � 0:97, and bottom panels for the second cycle with astationary value of mc ¼ 1 � 0:98, see Fig. 5.

3.1. Influence of the topology on the global overlap and

the learning time

Using the previous network parameter setting (N,K, yi,a), Table 1shows the dependence of global overlap and learning time on therandom connections parameter o at the learning stage.

As it can be observed, there is no significant differencebetween the processing time for learning the video with differentvalues of o and m parameters. This slight difference is only due tothe larger times to construct random networks than to constructlocal networks. The retrieval time for all cases was the same,around 5 min for the Kiev and 1 min and a half for the roundabout

sequences. In all cases, the respective memory usages for thelearning and retrieval stages are about 14.3% and 10.4% of thewhole computer memory, respectively.

One can conclude that, with a network with a randomness valueof o¼ 0:4, the retrieval of the Kiev video sequence is possible and it

http://www.axis.com/es/solutions/video/gallery.htm

http://www.axis.com/es/solutions/video/gallery.htm

ftp://amaethon.ii.uam.es/video/video5/


ftp://amaethon.ii.uam.es/video/roundabout/

ftp://amaethon.ii.uam.es/video/roundabout/

Fig. 3. Some retrieved sample frames (from left to right, frame numbers 1, 11, 21 and 31) of the roundabout traffic video sequence for f¼5. Initial overlap m1¼0.4.

Top panels: first cycle. Bottom panels: second cycle.

Fig. 2. Some retrieved sample frames (from left to right, frame numbers 1, 21, 41 and 61) of the Kiev crossroad traffic video sequence for f¼5. Initial overlap m1¼0.5.

Top panels: first cycle. Bottom panels: second cycle.

Table 1Randomness ratio versus global overlap and learning time for the Kiev crossroad

and roundabout video sequences.

Kiev crossroad Roundabout

o m Learning time o m Learning time

0.0 0.32117 499 m 39 s 0.0 0.32060 100 m 19 s

0.3 0.33677 500 m 08 s 0.4 0.20104 101 m 01 s

0.4 0.99751 500 m 25 s 0.6 0.05548 101 m 43 s

0.5 0.99767 501 m 30 s 0.7 0.98344 102 m 50 s

1.0 0.94742 504 m 27 s 1.0 0.99732 104 m 51 s


saved considerably on wiring costs as the small-world topologysuggests. It is also interesting to remark in Table 1 that the transitionfrom confusion state (i.e. m� 0) to the retrieval state (i.e. m� 1) forKiev traffic video happened around o¼ 0:35. This is related to aneffective percolation of the information over all the network.Although the network is always connected, for smaller values ofthe randomness parameter, the synaptic strengths are not strongenough to percolate the information from some pixels to everyregion of the neuron states. For the roundabout video sequence, therandomness value for the transition from the confusion to theretrieval state, o¼ 0:7, is higher than in the Kiev video. This effectcould be due to temporal correlation between frames which issmaller for the roundabout video.

We also experimented with a simpler ‘‘shifted-diagonal’’Hebbian learning matrix [27] replacing the pseudo-inverse rule(see Eqs. (9)–(11)). The maximal number of frames which couldbe retrieved for the Kiev video with N¼ 8544,K ¼ 4250,o¼ 0:5and with m� 1, was about P¼ 16. This choice is surely not

appropriate for strongly correlated patterns and other learningrules like covariance rule [7] or the Bayesian rule [21] have beenproposed to maximize the signal to noise ratio for a class ofassociative memories. A comparison with these models might bestudied in a future work.

3.2. Robustness of the model with respect to the frame activity

We tested the robustness of the model (i.e. how overlappedthe curves of average pattern and neural activities are along theframes of the video sequence) for a given network configuration:N¼8544, K¼4250 and o¼ 0:4. Fig. 4 shows that the model isrobust against a variable frame activity level, where the normal-ized activity (i.e. sparseness) of the frames am=a varies in therange 0:4oam=ao1:6). This graphic can be partitioned into threeregions according the numbering of the frames. In a first region,where m (black line) varies from 0.55 to around 0.95 (from firstframe to around frame 20), the average pattern (red line) andthe neural (blue line) activities are uncorrelated and patternactivity is much higher than temporal neural activity. In a secondregion, where the value of m remains stable around 0.95 (fromframes 21 to 225), the average pattern and neural activities arehighly correlated but pattern activity is slightly larger thantemporal neural activity. Finally, in the third region, where m

equals to one from 226 to the end of the video, the pattern andneural activities are exactly coincident despite the significantchanges in frame activity over time. The global overlap for thecycle is mc ¼ 0:93.

A similar curve for the roundabout sequence is presentedin Fig. 5 for the network configuration: N¼8480, K¼4240and o¼ 0:4. The overall behavior is similar to the Kiev sequence.

0 50 100 150 200 250 300 350

frame

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6 mμ

qμ/ <q>

aμ

/ <a>

N=8544, K=4250, m0=0.55, frames=1835/5=367, ω=0.5

Fig. 4. Kiev crossroad sequence: Plot of overlapped pattern and neural activities

against frames for N¼8544, K¼4250 and o¼ 0:4. Initial overlap mm ¼ 1 ¼ 0:5.

0 25 50 75 100 125

frame

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

mμ

qμ/ <q>

aμ

/ <a>

N=8480, K=4240, m0=0.4, frames=650/5=130, ω=0.7

Fig. 5. Roundabout: Plot of overlapped pattern and neural activities against

frames for N¼8480, K¼4240 and o¼ 0:7. Initial overlap mm ¼ 1 ¼ 0:4.


3.3. Other related experiments

We summarize some other experimental variations, tested onthe Kiev video, with respect to the previous parameters of themodel, and their corresponding results. In particular, the follow-ing cases were considered:

(a)
For a network topology size of N¼8544 and K¼6000, witho¼ 1, the network was able to retrieve 1835
3 ¼ 612 frameswith a mean overlap m¼0.99583. In this case, the processingtime for learning is 1963 m, with a memory usage of 20.4%.For retrieving the complete video, the memory usage was19.9% with a processing time of 10 m33 s. These results arefound at: http://dl.dropbox.com/u/11890025/video3.zip.

(b)
By setting the parameters N¼8544 and o¼ 0:5:� when P¼367 and K¼4250: an overlap value of m� ¼ 0:99767
is achieved, meaning that the attractor basin of the network islarge and the model is fault-tolerant (i.e. the video is retrievedstarting from noisy initial condition m0);

� when P¼367 and K¼3000: m decays in 3 cycles fromm� 1 to m� 0;� when P¼612, K¼4250: m decays to 0 faster (3 frames);

and� when N¼8544, but P¼ 367,o¼ 0:3, m decays to m� 0:3 in

1 cycle.

4. Conclusion

We used a Hopfield-type of attractor neural network (ANN) witha small-world connectivity distribution to learn and retrieve asequence of highly correlated patterns. For this network model, anew weight learning heuristic which combines the pseudo-inverseapproach with a row-shifting schema has been presented. Theinfluence of the random connectivity ratio on retrieval quality andlearning time has been studied. Our approach has been successfullytested for different combinations of the involved parameters on acomplex traffic video sequence. Moreover, it was demonstrated tobe robust with respect to highly variable frame activity.

Another additional conclusion of our study is that the morespatially correlated the frames are in average, the smaller is therange of the interaction (randomness parameter o) which opti-mizes the retrieval of the video. The opposite also holds: the lessspatially correlated the patterns are, the higher should be thevalue of o. For instance, if there are large regions in the frameswith high activity (i.e., a huge truck or bus in the corner) in a bulkof still background of the frame, then it is strongly spatiallycorrelated. On the other hand, the threshold strategy used in themodel is fundamental, since the dependency of y with the neuralactivity (as well as with the pattern activity) is set in such a waythat the network dynamics is self-controlled and it does not needfrom any human participation. For example, with the typicalactivity value used a¼0.1 in our traffic video, we set y� 1 in mostof the network. For a uniform activity degree in the frames (i.e.a¼1/2), no threshold is needed (y� 0). Finally, for extremelysparse code (where a-0), the threshold increases to y� 1=

ffiffiffiap

.Automatic video-based traffic monitoring systems are an alter-

native to loop detectors. Video-based systems provide updated globalinformation on the analyzed traffic scene and also specific informa-tions of the tracked vehicles. An interesting application of suchsystems is content-based traffic video retrieval, where using a queryvideo it is possible to retrieve another similar video from a databaseusing some types of extracted features from the videos (i.e. texturalinformation, motion trajectories of cars, etc.). This can be useful forsurveillance applications where we are interested in detecting certainevents on the video (i.e. accidents, congestions, etc.). To achieve thisgoal, most approaches follow a feature-extraction approach whichneeds to segment the cars in the video and to track them individually.In a different way, using a holistic method like the proposed in thispaper we can retrieve a complete video from a query frame if thisframe represents a noisy scene of the video.

As our approach is holistic in the sense that no segmentation andfeature extraction from the vehicles is required, we have to considerother holistic approaches applied to traffic videos for comparisonpurposes. The mentioned papers by Chan and Vasconcelos [5] andXie et al. [38] do not segment the vehicles in the video, but theyextract some global features from it (in particular, the completemotion information contained in the video), which are used for theretrieval task. They retrieve instances of traffic patterns using queryvideos; while in our approach the video can be retrieved using onlya unique (possibly noisy) query frame. Moreover, these two com-pared papers do not quantitatively measure the video retrievalquality as we do using the global overlap.

Up to our knowledge, this is the first application of small-world ANNs and a row-shifting pseudo-inverse method to this



specific content-based video retrieval problem. Our proposedsolution is suitable for the mentioned traffic application since itproduces accurate retrieval results at reasonable time. However,the required learning times are still very large and the systemneeds improvement to be competitive with respect to thoseclassical methods which segment the scene and track the movingtargets. Moreover, our proposal can be now suited only to thosetraffic applications where the learning stage can be carried outoff-line. Consequently, the use of complementary more-efficientstrategies to compress the amount of memory required to storethe patterns vectors like look-up tables [20] or hashing techni-ques like LSH [13] will be considered as future work.

Acknowledgments

This research has been partially supported by the Spanish projectsTIN2008-06890-C02-02 and TIN-2007-65989. M. Gonzalez thanksEM ECW Lot 20 for financial support. We thank F.B. Rodriguez foruseful discussion.

References

[1] D.J. Amit, Modeling Brain Function: The World of Attractor Neural Networks,Cambridge University Press, 1989.

[2] J. Bohland, A. Minai, Efficient associative memory using small-world archi-tecture, Phys. Rev. E 38 (40) (2001) 489–496.

[3] E. Bas, Road and traffic analysis from video, Master Thesis, Koc University,Turkey, 2007.

[4] A. Bovik, J. Gibson (Eds.), Handbook of Image and Video Processing, AcademicPress, 2000.

[5] A.B. Chan, N. Vasconcelos, Classification and retrieval of traffic video usingauto-regressive stochastic processes, in: Proceedings of the IEEE IntelligentVehicles Symposium, 2005.

[6] B. Coifman, D. Beymer, P. McLauchlan, J. Malik, A real-time computer visionsystem for vehicle tracking and traffic surveillance, Transp. Res. Part C Emerg.Technol. 6 (1998) 271–288.

[7] P. Dayan, D.J. Willshaw, Optimising synaptic learning rules in linear associa-tive memories, Biol. Cybernet. 65 (1991) 253–265.

[8] D. Dominguez, D. Bolle, Self-control in sparsely coded networks, Phys. Rev.Lett. 80 (1998) 2961.

[9] D. Dominguez, M. Gonzalez, E. Serrano, F.B. Rodrıguez, Structured informa-tion in small-world neural networks, Phys. Rev. E 79 (2009) 021909.

[10] D. Dominguez, K. Koroutchev, E. Serrano, F.B. Rodriguez, Information andtopology in attractor neural networks, Neural Comput. 19 (2007) 956–973.

[11] P. Foldiak, D. Endres, Sparse coding, Scholarpedia 3 (1) (2008) 2984.[12] D.A. Forsyth, J. Ponce, Computer Vision: A Modern Approach, Prentice Hall, 2003.[13] A. Gionis, P. Indyk, R. Motwani, Similarity Search in High Dimensions via

Hashing, in: Proceedings of the 25th Very Large Database Conference(VLDB’99), 1999.

[14] J.P.L. Hatchett, I. PerezCastillo, A.C. Coolen, N.S. Skantzos, Dynamical replicaanalysis of disordered Ising spin systems on finitely connected randomgraphs, Phys. Rev. Lett. 95 (2005) 117204.

[15] J. Hertz, J. Krogh, R. Palmer, Introduction to the Theory of Neural Computa-tion, Addison-Wesley, 1991.

[16] J.W. Hsieh, S. Hao, Y.S. Chen, W.F. Hu, Automatic traffic surveillance systemsfor vehicle tracking and classification, in: Proceedings of the IEEE Conferenceon Intelligent Transportation Systems, vol. 7, 2006, pp. 175–187.

[17] C. Johansson, A. Lansner, Imposing biological constraints onto an abstractneocortical attractor network model, Neural Comput. 19 (2007) 1871–1896.

[18] Y.K. Jung, Y.S. Ho, A feature-based vehicle tracking system in congested trafficvideo sequences, in: Proceedings of the PCM’01, Lecture Notes in ComputerScience, vol. 2195, 2001, pp. 190–197.

[19] V. Kastrinaki, M. Zervakis, K. Kalaitzakis, A survey of video processingtechniques for traffic applications, Image Vision Comput. 21 (2003) 359–381.

[20] A. Knoblauch, G. Palm, F.T. Sommer, Memory capacities for synaptic andstructural plasticity, Neural Comput. 22 (2010) 289–341.

[21] A. Knoblauch, Optimal synaptic learning in nonlinear associative memory, in:IJCNN, vol. 167, 2010, pp. 3205–3211.

[22] K. Koroutchev, E. Koroutcheva, Bump formation in a binary attractor neuralnetwork, Phys. Rev. E 73 (2006) 026107.

[23] B. Li, R. Chellappa, A generic approach to simultaneous tracking andverification in video, IEEE Trans. Image Process. 11 (2002) 530–544.

[24] C. Li, G. Chen, Stability of a neural network model with small-worldconnections, Phys. Rev. E 68 (2003) 052901.

[25] N. Masuda, K. Aihara, Global and local synchrony of coupled neurons insmall-world networks, Biol. Cybernet. 90 (2004) 302–309.

[26] P.N. McGraw, M. Menzinger, General theory of nonlinear flow-distributedoscillations, Phys. Rev. E 68 (2003) 047102.

[27] C. Molter, U. Salihoglu, H. Bersini, Storing static and cyclic patterns in anHopfield neural network, Technical Report, Universite Libre de Bruxelles, 2005.

[28] L. Morelli, G. Abramson, M. Kuperman, Auto-associative memory in a small-world neural network, Eur. Phys. J. B 38 (2004) 495–500.

[29] Octave GNU Homepage, /http://www.gnu.org/software/octave/S, 2006.[30] B.A. Olshausen, D.J. Field, Sparse coding of sensory inputs, Curr. Opin.

Neurobiol. 14 (2004) 481–487.[31] M. Okada, Notions of associative memory and sparse coding, Neural Net-

works 9 (1996) 1429–1458.[32] D.R. Paula, A.D. Araujo, J.S. Andrade Jr., H.J. Herrmann, J.A.C. Gallas, Periodic

neural activity induced by network complexity, Phys. Rev. E 74 (2006) 017102.[33] B. Ristic, S. Arulampalam, N. Gordon, Beyond the Kalman Filter: Particle

Filters for Tracking Applications, Artech House, 2004.[34] E. Rolls, A. Treves, Neural Networks and Brain Function, Oxford University

Press, 1998.[35] Y. Roudi, A. Treves, Localized activity profiles and storage capacity of rate-

based auto-associative networks, Phys. Rev. E 73 (2006) 061904.[36] A. Storkey, Increasing the capacity of a Hopfield network without sacrificing

functionality, in: Proceedings of the ICANN’97, Lecture Notes in ComputerSciences, vol. 1327, 1997.

[37] T.P. Trappenberg, Fundamentals of Computational Neuroscience, OxfordUniversity Press, 2002.

[38] D. Xie, W. Hu, T. Tan, J. Peng, Semantic–based traffic video retrieval usingactivity pattern analysis, in: Proceedings of the International Conference onImage Processing (ICIP’04), 2004, pp. 693–696.

[39] D.J. Watts, S.H. Strogatz, Collective dynamics of small-world networks,Nature 393 (1998) 440–442.

Mario Gonzalez Postgraduate student at EscuelaPolitecnica Superior (EPS), Universidad Autonomade Madrid (UAM), Spain. His researches includeneural networks and information theory. Member ofGNB-UAM (Grupo de Neurocomputacion Biologica).Presently at FEUP, Portugal.

David Dominguez received his M.Sc. (1987) and Ph.D.(1993) in physics from the UFRGS, Porto Alegre, Brazil.He worked at the KUL, Leuven, Belgium and at theINTA, URJC and UCM, Madrid, Spain. Since 2001 he is aprofessor at the EPS, UAM.

Angel Sanchez received his B.Sc. (1986) and Ph.D.(1990) degrees both in Computer Science from theTechnical University of Madrid, Spain. He is an Associ-ate Professor of the Department of Computing Sciencesat University Rey Juan Carlos, Madrid. His currentresearch interests involve Computer Vision applica-tions, Pattern Recognition, Biometrics and Face Recog-nition. He is a member of IEEE Computer Society, andthe Spanish Association for Pattern Recognition andImage Analysis.

http://www.gnu.org/software/octave/

Documents

Learning sequences of sparse correlated patterns using small-world attractor neural networks: An application to traffic videos