2
Gamma-lter self-organising neural networks for unsupervised sequence processing P.A. Este ´vez , R. Herna ´nd ez, C.A. Pe rez an d C.M. Hel d Adding g -lters to self-organising neural networks for unsupervised sequence processing is proposed. The proposed g -context model is app lie d to self- organisi ng maps and neu ra l gas networks. The g -contex t model is a gene ral isa tion tha t incl udes as a part icul ar example the previously published merge-context model. The results show that the g -conte xt model outperforms the merge-c ontext model in terms of temporal quantisation error and state-space representation.  Introduction: Self-organising feature maps (SOMs) [1] have been used extensively for clustering and data visualisation of static data. Recently, SOMs have been extended to sequence processing by using recurrent connections and tempor al or spati al contex t repr esenta tions; see [2] for a review. Engineering applications include the analysis of temporally or spatially connected data, such as DNA chains, biomedical signals, speech and image processing, and time series in general. In the merge SOM (MSOM) model [3], each neuron is associated with a wei ght vector and a context vector. The merge context model can be combined with neural gas (NG) [4], creating the merge neural gas (MNG) model [5]. In this Letter, addin g g -lters [6] to self-organising neural networks for unsupervised sequence processing is proposed. The g -context model is applied to extend SOM and NG networks.  Method: We des cr ibe our method using the NG netwo rk model. A refer ence vector w i [ < d is ass ocia ted with the ith neuron , for i = 1, ··· , M , where d is the dimensionality of the input space and M is the number of neur ons. A set of  g -l ter ve ctor contex ts C = {c i 1 , c i 2 , ··· , c i  K } is added to each ith neuron, where c i k [ < d for k = 1, ··· , K ; i = 1, ··· , M , and K is the highest order g -lter . When a data vector x(n) is presented to the network at time n, the best matching unit (BMU) , I n , is determined by comput ing the closes t neuro n accord- ing to the following recursiv e distan ce criterion or disto rtion error: d i (n) = a 0  x(n) w i 2 +  K k =1 a k C k (n) c i k  2 (1) where the parameters a k , for k = 0, ··· , K , control the contribution of reference and context vectors. Because the context construction is recur- sive, setting a 0 . a 1 . ··· . a  K . 0 is recommended. The K descrip- tors of the current context, C k (n), are recursively computed as the linear combination of the contexts of the k th and (k 2 1)th order g -lter associ- ated with the BMU obtained at the previous time step, I n1 , i.e. C k (n) = b c  I n1 k + (1 b ) c  I n1 k 1 ∀ k = 1, ··· , K (2) where c  I n1 0 = w  I n1 , and the initial conditions c  I 0 k , ∀ k = 1, ··· , K , are set randomly. The parameter b [ (0, 1) provides a mechanism to decouple depth, D, and resolution, R, from the lter order. Depth indic ates how far into the past the memory stores information. Resolution indicates the degree to which the information relative to the individual elements of the sequence is preserved. The mean memory depth for a g -lter of order K is D = K /(1 b ), and its resolution is R = 1 b [6]. By increasing K , we can achieve a greater depth without compromising res- olution. It is easy to verify that for K ¼ 1 in (2), i.e. when only a single g -lter stage is used, the merge context model used in MSOM [3] and MNG [5] is recovered. When an input vector x is presented, we determine the BMU, the second closest neuron, and so on. The ranking index associated with each ith neuron is denoted as r i [ {0, ··· , M 1}. The neighbourhood funct ion is dened in in put space as h l (r i ) = exp(r i /l) ∀ i = 1, ··· , M , where l denes the neighbourhoo d size. The cost func- tion for the g -NG model is dened as:  E g ng (w, c 1 , ··· , c  K , l) = 1 2  H (l)  M i=1 V  P (  x)h l (r i ) d i (  x, w i , c i 1 , ··· , c i  K ) dx (3) where V # < d is the input space, P (  x) is the probability distribution of data vectors over the manifold V ; h l is the neighbourhood function; d i is the dis tor tion err or, and H (l) is a normal isation fa ct or , H (l) =  M i=1 h l (r i ). When a k = 0, ∀ k = 1, ··· , K , in (1), the g -NG cost func- tion (3) reduces to the original NG cost function [4]. Minimising (3)  by stochastic gradient descent we obtain the following update rules for reference and context vectors: Dw i = 1 w (n)h l (k i )(  x(n) w i ) Dc i k = 1 c (n)h l (k i )(C k (n) c i k ) (4) where the learning rates 1 w , 1 c , as well as l, decrease monotonically from an initial value, v 0 , to a nal value, v  f  , as v (n) = v 0 (v  f  /v 0 ) n/nmax , where n max is the maximum number of iterations. Temporal quantisation error (TQE) [7] is used to assess the represen- tation of temporal dependencies in the map. The quantisation error of a neuro n corre sponds to the standard deviation associated with its recep- tive eld. The TQE of the entire map for t time-steps back into the  pa st is dened as the aver age quan tisa tion err or mea sur ed over all neurons. Recurrence plots are useful tools for visualising recurrences of dynamical systems in the phase space [8]. A recurrence matrix is dened where the ij th component is one if the states of the system at times i and j are similar up to an error 1, otherwise the corresponding entry in the matrix is zero. The parameter 1 is xed so that the recurrence rate is 0.02 [8]. The g -NG model can be interpreted as a quantised recon- struction of the phase space in the K + 1 dimensional space by dening the vector r (t ) = [w  I n , c  I n 1 , ··· , c  I n  K ]. g -NG algorit hm: 1. Initialise ran domly re ferenc e ve ctor s w i and context vec tors c i k , i = 1, ··· , M ; k = 1, ··· , K . 2. Pres ent input vector , x(n), to the network 3. Calcula te context descripto rs C k (n) using (2) 4. Find the BMU, I n , using (1) 5. Update ref erenc e and contex t vectors using (4). 6. Set  n n + 1 7. If  n , L go back to step 2, where L is the cardinality of the data set. The g -context model can easily be added to SOM, by replacing the neighbou rhood function h l (k i ) in (4) by a neig hbourhood functio n dened in the 2D outp ut gri d, typi cal ly a Gaussia n centre d in the winn er unit i , h s (d G (i, i )) = exp(d G (i, i )/s (n)), wher e d G is the Manhattan distance, and s is the neighbourhood size; see [9] for more details on g -SOM. Experi ments were carried out with two data sets: Mackey- Glass time series [3] and Ro ¨ ssler’ s chao tic system [8]. Parameter b in (2) was varied from 0.1 to 0.9 with 0.1 steps. Likewise, the number of lter stages, K , was varied from 1 to 9. The number of neurons for both SOM (square grid) and NG variants was set to M = 0.1  L where L is the length of the time ser ies. Tra ining is done in two stage s las ting 100 epochs each. In the r st st age, the para meters ar e set to a 0 = 0.5 an d  a k = 0.5/  K , ∀ k = 1, ··· , K in (1). This is done to favour a good con- vergence of the reference vectors in the rst stage. In the second training stage, the parameters are set to a k = (  K + 1 k )/  A, ∀ k = 0, ··· , K , where A is det ermine d by imposi ng the constraint  k a k = 1. The init ial and nal valu es of par ame ter s l, 1 w , 1 c wer e set as fol lows: l 0 = 0.1  M , l  f  = 0.01, 1 w0 = 0.3, 1 wf  = 0.001, 1 c0 = 0.3, 1 cf  = 0.001. s was set in the same way as l. 1 w was xed to 0.001 during the second training phase. 0.25 0.20 0.15    t   e   m   p   o   r   a    l   q   u   a   n    t    i   s   a    t    i   o   n   e   r   r   o   r 0.10 0.05 0 0 5 10 15 time steps into past 20 25 SOM MSOM (β=0.6) γ SOM (k =6,β=0.1) NG 30 Fig. 1 Temporal quantisation error against time-steps into past for Mackey- Glass time series, using SOM, NG, MSOM and g -SOM ELECTRONICS LET TE RS 14th Ap ri l 2 011 Vol. 47 No . 8 

P.A. Estevez, R. Hernandez, C.A. Perez and C.M. Held- Gamma-filter self-organising neural networks for unsupervised sequence processing

  • Upload
    grettsz

  • View
    217

  • Download
    0

Embed Size (px)

Citation preview

Page 1: P.A. Estevez, R. Hernandez, C.A. Perez and C.M. Held- Gamma-filter self-organising neural networks for unsupervised sequence processing

8/3/2019 P.A. Estevez, R. Hernandez, C.A. Perez and C.M. Held- Gamma-filter self-organising neural networks for unsupervis…

http://slidepdf.com/reader/full/pa-estevez-r-hernandez-ca-perez-and-cm-held-gamma-filter-self-organising 1/2

Gamma-filter self-organising neuralnetworks for unsupervised sequenceprocessing

P.A. Estevez, R. Hernandez, C.A. Perez and C.M. Held 

Adding g -filters to self-organising neural networks for unsupervised 

sequence processing is proposed. The proposed  g -context model is

applied to self-organising maps and neural gas networks. The

g -context model is a generalisation that includes as a particular 

example the previously published merge-context model. The resultsshow that the g -context model outperforms the merge-context model

in terms of temporal quantisation error and state-space representation.

 Introduction: Self-organising feature maps (SOMs) [1] have been used 

extensively for clustering and data visualisation of static data. Recently,

SOMs have been extended to sequence processing by using recurrent 

connections and temporal or spatial context representations; see [2] for 

a review. Engineering applications include the analysis of temporally

or spatially connected data, such as DNA chains, biomedical signals,

speech and image processing, and time series in general. In the merge

SOM (MSOM) model [3], each neuron is associated with a weight 

vector and a context vector. The merge context model can be combined 

with neural gas (NG) [4], creating the merge neural gas (MNG) model

[5]. In this Letter, adding g -filters [6] to self-organising neural networks

for unsupervised sequence processing is proposed. The g -context modelis applied to extend SOM and NG networks.

 Method: We describe our method using the NG network model.

A reference vector  wi [ <d  is associated with the ith neuron, for 

i = 1, · · · , M , where d  is the dimensionality of the input space and  M 

is the number of neurons. A set of   g -filter vector contexts

C  = {ci1, ci

2, · · · , ci K } is added to each ith neuron, where ci

k  [ <d 

for 

k  = 1, · · · , K ; i = 1, · · · , M , and  K  is the highest order g -filter. When

a data vector x(n) is presented to the network at time n, the best matching

unit (BMU), I n, is determined by computing the closest neuron accord-

ing to the following recursive distance criterion or distortion error:

d i(n) = a0 x(n) − wi2 + K 

k =1

ak C k (n) − cik 

2 (1)

where the parameters ak , for  k  = 0, · · · , K , control the contribution of 

reference and context vectors. Because the context construction is recur-

sive, setting a0 . a1 . · · · . a K  . 0 is recommended. The K descrip-

tors of the current context, C k (n), are recursively computed as the linear 

combination of the contexts of the k th and (k 2 1)th order g -filter associ-

ated with the BMU obtained at the previous time step, I n−1, i.e.

C k (n) = b c I n−1

k  + (1 − b ) c I n−1

k −1 ∀ k  = 1, · · · , K  (2)

where c I n−1

0 = w I n−1 , and the initial conditions c I 0k  , ∀ k  = 1, · · · , K , are set 

randomly. The parameter b [ (0, 1) provides a mechanism to decouple

depth, D, and resolution, R, from the filter order. Depth indicates how far 

into the past the memory stores information. Resolution indicates the

degree to which the information relative to the individual elements of 

the sequence is preserved. The mean memory depth for a g -filter of 

order  K  is D = K /(1 − b ), and its resolution is R = 1 − b  [6]. Byincreasing K , we can achieve a greater depth without compromising res-

olution. It is easy to verify that for  K ¼ 1 in (2), i.e. when only a single

g -filter stage is used, the merge context model used in MSOM [3] and 

MNG [5] is recovered.

When an input vector  x is presented, we determine the BMU, the

second closest neuron, and so on. The ranking index associated with

each ith neuron is denoted as r i [ {0, · · · , M  − 1}. The neighbourhood 

function is defined in input space as hl(r i) = exp(−r i/l)

∀ i = 1, · · · , M , where l defines the neighbourhood size. The cost func-

tion for the g -NG model is defined as:

 E g −ng (w, c1, · · · , c K ,l) =

1

2 H (l) M 

i=1V 

 P ( x)hl(r i) d i( x, wi, ci1, · · · , ci

 K ) dx(3)

where V # <d 

is the input space, P ( x) is the probability distribution of 

data vectors over the manifold  V ; hl is the neighbourhood function; d i is

the distortion error, and  H (l) is a normalisation factor, H (l) =

 M i=1hl(r i). When ak  = 0, ∀ k  = 1, · · · , K , in (1), the g -NG cost func-

tion (3) reduces to the original NG cost function [4]. Minimising (3)

 by stochastic gradient descent we obtain the following update rules for 

reference and context vectors:

Dwi = 1w(n)hl(k i)( x(n) − wi)

Dcik  = 1c(n)hl(k i)(C k (n) − ci

k )(4)

where the learning rates 1w, 1c, as well as l, decrease monotonically

from an initial value, v 0, to a final value, v  f  , as v  (n) = v 0(v  f  /v 0)n/nmax ,

where nmax is the maximum number of iterations.Temporal quantisation error (TQE) [7] is used to assess the represen-

tation of temporal dependencies in the map. The quantisation error of a

neuron corresponds to the standard deviation associated with its recep-

tive field. The TQE of the entire map for  t  time-steps back into the

  past is defined as the average quantisation error measured over all

neurons. Recurrence plots are useful tools for visualising recurrences

of dynamical systems in the phase space [8]. A recurrence matrix is

defined where the ij th component is one if the states of the system at 

times i and  j  are similar up to an error  1, otherwise the corresponding

entry in the matrix is zero. The parameter 1 is fixed so that the recurrence

rate is 0.02 [8]. The g -NG model can be interpreted as a quantised recon-

struction of the phase space in the K + 1 dimensional space by defining

the vector  r (t ) = [w I n , c I n1 , · · · , c I n

 K ].

g -NG algorithm:

1. Initialise randomly reference vectors wi and context vectors

cik , i = 1, · · · , M ; k  = 1, · · · , K .

2. Present input vector, x(n), to the network 

3. Calculate context descriptors C k (n) using (2)

4. Find the BMU, I n, using (1)

5. Update reference and context vectors using (4).

6. Set n n + 1

7. If n ,  L go back to step 2, where L is the cardinality of the data set.

The g -context model can easily be added to SOM, by replacing the

neighbourhood function hl(k i) in (4) by a neighbourhood function

defined in the 2D output grid, typically a Gaussian centred in the

winner unit  i∗

, hs (d G (i, i∗

)) = exp(−d G (i, i∗

)/s (n)), where d G  isthe Manhattan distance, and  s  is the neighbourhood size; see [9] for 

more details on g -SOM.

Experiments were carried out with two data sets: Mackey-Glass time

series [3] and Rossler’s chaotic system [8]. Parameter b in (2) was varied 

from 0.1 to 0.9 with 0.1 steps. Likewise, the number of filter stages, K ,

was varied from 1 to 9. The number of neurons for both SOM (square

grid) and NG variants was set to M  = 0.1 L where L is the length of 

the time series. Training is done in two stages lasting 100 epochs

each. In the first stage, the parameters are set to a0 = 0.5 and 

ak  = 0.5/ K , ∀ k  = 1, · · · , K  in (1). This is done to favour a good con-

vergence of the reference vectors in the first stage. In the second training

stage, the parameters are set to ak  = ( K  + 1 − k )/ A, ∀ k  = 0, · · · , K ,

where A is determined by imposing the constraint 

k ak  = 1. The

initial and final values of parameters l, 1w, 1c were set as follows:

l0 = 0.1 M ,l f   = 0.01, 1w0 = 0.3, 1wf   = 0.001, 1c0 = 0.3, 1cf   =0.001. s was set in the same way as l. 1w was fixed to 0.001 during

the second training phase.

0.25

0.20

0.15

   t  e  m  p  o  r  a   l  q  u  a  n   t   i  s  a   t   i  o  n  e  r  r  o  r

0.10

0.05

00 5 10 15

time steps into past

20 25

SOM

MSOM (β=0.6)

γ SOM (k =6,β=0.1)

NG

30

Fig. 1 Temporal quantisation error against time-steps into past for Mackey-Glass time series, using SOM, NG, MSOM and g  -SOM 

ELECTRONICS LETTERS 14th April 2011 Vol. 47 No. 8 

Page 2: P.A. Estevez, R. Hernandez, C.A. Perez and C.M. Held- Gamma-filter self-organising neural networks for unsupervised sequence processing

8/3/2019 P.A. Estevez, R. Hernandez, C.A. Perez and C.M. Held- Gamma-filter self-organising neural networks for unsupervis…

http://slidepdf.com/reader/full/pa-estevez-r-hernandez-ca-perez-and-cm-held-gamma-filter-self-organising 2/2

Figs. 1 and 2 illustrate the TQE as a function of the time-steps into the

 past for the Mackey-Glass time series, with L ¼ 1000. Fig. 1 shows that 

g -SOM obtained a much lower TQE than both SOM and MSOM; in

addition, NG obtained a lower TQE than SOM. Fig. 2 shows that 

g -NG with K . 1 obtained a lower TQE than MNG ( K ¼ 1). In addition

MNG and  g -NG obtained, respectively, similar TQEs to MSOM and 

g -SOM. For the 3D Rossler’s chaotic system, we draw 4000 points

from the first principal component. Fig. 3 illustrates the recurrence

  plot obtained with the g -NG state-space reconstruction from the 1D

Rossler time series. The latter produces an error of 3.74% when com-

  pared point to point with the recurrence plot of the original 3D

Rossler system. In contrast, the error using NG is 6.85%, and using

MNG is 5.31%.

0.11

0.09

0.07

   t  e  m  p  o  r  a   l  q  u  a  n   t   i  s  a   t   i  o  n  e  r  r  o  r

0.05

0.03

0.010 5 10 15

time steps into past

20 25

MNG (β=0.5)

γ NG (k =2,β=0.4)

γ NG (k =3,β=0.3)γ NG (k =6,β=0.4)

γ NG (k =7,β=0.3)

30

Fig. 2 Temporal quantisation error against time-steps into past for Mackey-Glass time series, using MNG, and g -NG with different b and K parameters

4000

3500

3000

2500

2000

1500

1000

1000 2000 3000 4000

500

Fig. 3 Recurrence plot for g -NG state-space reconstruction from 1D Ro ssler time series

Conclusion: Adding g -filters to neurons allows extending self-

organising neural networks to deal with unsupervised sequence proces-

sing. The proposed g -context yielded significantly fewer temporal quan-

tisation errors and better state-space representations than merge context 

models. Potential engineering applications of the proposed model

include time series analysis, spatio-temporal maps, and the visualisation

of temporally or spatially connected data.

 Acknowledgment: This work was funded by Conicyt-Chile under grant 

Fondecyt no. 1080643.

# The Institution of Engineering and Technology 2011

13 January 2011

doi: 10.1049/el.2011.0115

P.A. Estevez, R. Hernandez, C.A. Perez and C.M. Held ( Department of  

  Electrical Engineering and Advanced Mining Technology Center,

Universidad de Chile, Casilla, Santiago 412-3, Chile )

E-mail: [email protected]

References

1 Kohonen, T.: ‘Self-organizing maps’ (Springer-Verlag, Berlin, Germany,1995)

2 Salhi, M.S., Arous, N., and Ellouze, N.: ‘Principal temporal extensions of SOM: overview’, Int. J. Signal Process. Image Process. Pattern

 Recognit., 2009, 2, (3), pp. 121–1443 Strickert, M., and Hammer, B.: ‘Merge SOM for temporal data’, Neurocomputing , 2005, 64, pp. 39–72

4 Martinetz, T.M., Bercovich, S.G., and Schulten, K.J.: ‘“Neural-gas”network for vector quantization and its applications to time-series prediction’, IEEE Trans. Neural Netw., 1993, 4, pp. 558–569

5 Strickert, M., and Hammer, B.: ‘Neural gas for sequences’. Proc.Workshop on Self-Organizing Networks, (WSOM), Kyushu, Japan,2003, pp. 53–58

6 Principe, J.C., de Vries, B., and de Oliveira, P.G.: ‘The gamma filter – anew class of adaptive IIR filters with restricted feedback’, IEEE Trans.Signal Process., 1993, 41, pp. 649–656

7 Strickert, M., Hammer, B., and Blohm, S.: ‘Unsupervised recursivesequence processing’, Neurocomputing , 2005, 63, pp. 69–97

8 Marwan, N., Romano, M.C., Thiel, M., and Kurths, J.: ‘Recurrence plotsfor the analysis of complex systems’, Phys. Rep., 2007, 438, pp. 237–329

9 Estevez, P.A., and Hernandez, R.: ‘Gamma SOM for temporal sequence

 processing’, LNCS , 2009, 5629, pp. 63–71

ELECTRONICS LETTERS 14th April 2011 Vol. 47 No. 8