6
AbstractThis paper presents a novel growing self-organizing map which features incremental learning, dynamic network structure and good visualization ability. It allows for on-line and continuous learning on both static and evolving data distributions. The experiments are carried out on some benchmark data sets for vector quantisation and clustering. Compared with the GSOM method, our results show that this new model can achieve better or comparable performance in real-world data sets. I. INTRODUCTION With the rapid growth of the internet and hardware technology, many information systems need to process on-line data streams that are updated frequently and infeasible to store in memory completely, such as stock market indexes, readings of smart meters, and video streams transferred across the Internet. Since traditional data mining methods dealt with data sets that were static and typically require multiple scans of the data, these methods cannot give efficient results for the analysis of data streams. There has been increasing interest in developing new data mining techniques to convert this tremendous amount of data into useful information, such as dynamic clustering, visualization, and incremental learning etc. Clustering is the process of grouping the data into classes or clusters, so that objects within a cluster have high similarity in comparison to one another but are very dissimilar to objects in other clusters [1]. The Self-Organizing Map (SOM) is a very well-known data analysis tool for data visualization and clustering [2]-[4]. Learning of the SOM is based on the concept that the behavior of a node should impact only close nodes and arcs near it. Weights are initially assigned randomly and adjusted during the learning process to produce better results. During the learning process, hidden features or patterns in the data are uncovered and the weights are adjusted accordingly. The weights of the output network are eventually expected to converge to represent the statistical regularities of the input *Research supported by the National Natural Science Foundation of China under grants 60905009, 61004032, 61172135, Jiangsu Natural Science Foundation under grant SBK201240801 and BK2012384. Ting Wang was with School of Automation, Southeast University, Nanjing, 210096 China. She is now with School of Electric and Computer Science, RMIT University, VIC 3167 Australia (corresponding author to provide phone: 0424-323-846; e-mail: [email protected]). Xinghuo Yu is now with the Platform Technologies Research Institute, RMIT University, VIC 3167 Australia (e-mail:[email protected]). Damminda Alahakoon is with Clayton School of Information Technology, Monash University ,VIC 3800 Australia. (e-mail: d[email protected] ). Shumin Fei is with School of Automation, Southeast University,Nanjing, 210096 China.(e-mail: [email protected]) data. One of the drawbacks of the SOM is that the size and the number of nodes have to be predetermined [5]. But only at the completion of the simulation that a different sized network would be identified as more appropriate for the application. This may lead to many experiments with different sized maps, trying to obtain the optimum network. Its fixed topology on the feature map and the fixed number of nodes is another disadvantage of the SOM. Once a map is learned, it cannot change its size to meet the need of new data environment. The feature map space of low dimension (2 or 3) is intended to facilitate visualization, but for high dimensional data, the feature map can be folded and hence generate poor topology representation of the data. Methods that identify and repair topological defects are costly. Further problems such as no cluster boundaries also exist in the SOM [10]. The solution to these problems mentioned above is to design an incremental and growing neural model which can adapt its structure to better represent the clusters in the data, i.e., self-generating. Therefore the network should grow incrementally, acquiring the number of nodes needed to solve a current task, getting the number of clusters, and accommodating input patterns of on-line evolving data streams. In this paper we propose a new incremental neural network which allows for on-line and continuous learning on both static and evolving data distributions. This model is a topology based neural network as a variant of self-generating map. Thus, an overview of existing structure adapting neural networks is given before introducing the proposed new incremental neural network algorithm in the next section. Section III shows simulations on benchmark data sets for clustering and topology preservation, and results are compared with other methods. Finally some discussions and conclusions are set out in section IV. II. THE COMPUTATIONAL MODEL We first introduce the general properties of structure adapting models. The network structure is a graph consisting of a set of nodes N and a set of edges E connecting the nodes. Each node i is attached a weight vector (or reference vector, position vector) i w in the input space. During the learning procedure, adaptation of the weight vectors is done by moving the position of the best matching unit (or the winner) q and its topological neighbours in the graph toward the input data. For an input signal v the corresponding best matching unit q is: An Enhancing Dynamic Self-Organizing Map for Data Clustering* Ting Wang, Xinghuo Yu, Damminda Alahakoon,and Shumin Fei 2013 10th IEEE International Conference on Control and Automation (ICCA) Hangzhou, China, June 12-14, 2013 978-1-4673-4708-2/13/$31.00 ©2013 IEEE 1324

[IEEE 2013 10th IEEE International Conference on Control and Automation (ICCA) - Hangzhou, China (2013.06.12-2013.06.14)] 2013 10th IEEE International Conference on Control and Automation

  • Upload
    shumin

  • View
    213

  • Download
    1

Embed Size (px)

Citation preview

Abstract— This paper presents a novel growing self-organizing map which features incremental learning, dynamic network structure and good visualization ability. It allows for on-line and continuous learning on both static and evolving data distributions. The experiments are carried out on some benchmark data sets for vector quantisation and clustering. Compared with the GSOM method, our results show that this new model can achieve better or comparable performance in real-world data sets.

I. INTRODUCTION

With the rapid growth of the internet and hardware technology, many information systems need to process on-line data streams that are updated frequently and infeasible to store in memory completely, such as stock market indexes, readings of smart meters, and video streams transferred across the Internet. Since traditional data mining methods dealt with data sets that were static and typically require multiple scans of the data, these methods cannot give efficient results for the analysis of data streams. There has been increasing interest in developing new data mining techniques to convert this tremendous amount of data into useful information, such as dynamic clustering, visualization, and incremental learning etc.

Clustering is the process of grouping the data into classes or clusters, so that objects within a cluster have high similarity in comparison to one another but are very dissimilar to objects in other clusters [1]. The Self-Organizing Map (SOM) is a very well-known data analysis tool for data visualization and clustering [2]-[4]. Learning of the SOM is based on the concept that the behavior of a node should impact only close nodes and arcs near it. Weights are initially assigned randomly and adjusted during the learning process to produce better results. During the learning process, hidden features or patterns in the data are uncovered and the weights are adjusted accordingly. The weights of the output network are eventually expected to converge to represent the statistical regularities of the input

*Research supported by the National Natural Science Foundation of

China under grants 60905009, 61004032, 61172135, Jiangsu Natural Science Foundation under grant SBK201240801 and BK2012384. Ting Wang was with School of Automation, Southeast University, Nanjing, 210096 China. She is now with School of Electric and Computer Science, RMIT University, VIC 3167 Australia (corresponding author to provide phone: 0424-323-846; e-mail: [email protected]).

Xinghuo Yu is now with the Platform Technologies Research Institute, RMIT University, VIC 3167 Australia (e-mail:[email protected]).

Damminda Alahakoon is with Clayton School of Information Technology, Monash University ,VIC 3800 Australia. (e-mail: [email protected]).

Shumin Fei is with School of Automation, Southeast University,Nanjing, 210096 China.(e-mail: [email protected])

data. One of the drawbacks of the SOM is that the size and the number of nodes have to be predetermined [5]. But only at the completion of the simulation that a different sized network would be identified as more appropriate for the application. This may lead to many experiments with different sized maps, trying to obtain the optimum network. Its fixed topology on the feature map and the fixed number of nodes is another disadvantage of the SOM. Once a map is learned, it cannot change its size to meet the need of new data environment. The feature map space of low dimension (2 or 3) is intended to facilitate visualization, but for high dimensional data, the feature map can be folded and hence generate poor topology representation of the data. Methods that identify and repair topological defects are costly. Further problems such as no cluster boundaries also exist in the SOM [10].

The solution to these problems mentioned above is to design an incremental and growing neural model which can adapt its structure to better represent the clusters in the data, i.e., self-generating. Therefore the network should grow incrementally, acquiring the number of nodes needed to solve a current task, getting the number of clusters, and accommodating input patterns of on-line evolving data streams.

In this paper we propose a new incremental neural network which allows for on-line and continuous learning on both static and evolving data distributions. This model is a topology based neural network as a variant of self-generating map. Thus, an overview of existing structure adapting neural networks is given before introducing the proposed new incremental neural network algorithm in the next section. Section III shows simulations on benchmark data sets for clustering and topology preservation, and results are compared with other methods. Finally some discussions and conclusions are set out in section IV.

II. THE COMPUTATIONAL MODEL

We first introduce the general properties of structure adapting models. The network structure is a graph consisting of a set of nodes N and a set of edges E connecting the nodes. Each node i is attached a weight vector (or reference vector, position vector) iw in the input space. During the learning procedure, adaptation of the weight vectors is done by moving the position of the best matching unit (or the winner) q and its topological neighbours in the graph toward the input data. For an input signal v the corresponding best matching unit q is:

An Enhancing Dynamic Self-Organizing Map for Data Clustering* Ting Wang, Xinghuo Yu, Damminda Alahakoon,and Shumin Fei

2013 10th IEEE International Conference on Control and Automation (ICCA)Hangzhou, China, June 12-14, 2013

978-1-4673-4708-2/13/$31.00 ©2013 IEEE 1324

arg min ii Nq v w

∈= −

and the weight vector is updated as follows: ( ) ( )iqii wvhkw −=Δ ,γ

where ( )kγ is a scalar-valued learning rate factor and qih , is

a neighbourhood function.

A. Preliminary of Structure Adapting Neural Network Models There are 6 typical models of structure adapting neural

networks.

The Growing Cell Structures (GCS) model [6] is based on the SOM algorithm, but the basic two-dimensional grid of the SOM has been replaced by a structure consisting of hyper-tetrahedrons with a fixed dimensionality chosen in advance.

The Growing Neural Gas (GNG) model [8] is based on the neural gas algorithm [7] and imposes no explicit constraints on the graph topology. The graph is generated and updated continuously by competitive Hebbian Learning. The adaptation of weight vectors and the insertion rule of new nodes are the same as in GCS. The topology of a GNG network can reflect the topology of the input data distribution and can have different dimensionalities in different parts of the input space. For this reason it is only possible to visualize low dimensional input data.

Incremental Grid Growing (IGG) model in [9, 10] starts with four connected nodes and generates new nodes from the boundary of the network using a growth heuristic. Adding nodes only at the perimeter allows the IGG network to always maintain two dimensional structure, which results in easy visualization. Therefore the structure of the data is apparent in the structure of the network without having to plot the weight values. Furthermore, connections are added when the difference between weight vectors of neighbouring nodes drops below a “connect” threshold value and connections are removed when the difference exceeds above a “disconnect” threshold value.

The Growing Self-Organizing Map (GSOM) [5] combines the weight vector adaptation of SOM with the strategy of adding new nodes only from the boundary of the network. But it uses a weight initialization method to newly added nodes which is simpler than the ones in IGG and reduces the possibility of twisted maps. In addition to this advantage, by introducing a spread factor to control the spread of GSOM, hierarchical clustering at different levels can be achieved.

Evolving Self-Organizing Map (ESOM) [11, 12] starts without nodes. During learning, the network creates new nodes and sets up connections dynamically when none of the existing nodes matches the input vector within a distance threshold. Insertion of nodes is straightforward rather than using mid-point interpolation as in GCS and GNG. The strength of the neighbourhood relation is determined by the distance between connected nodes. If the distance is too big,

it results in a weak connection and the connection should be pruned.

Incremental Growing Neural Gas (IGNG) [13]: The graph topology of IGNG is unconstrained and continuously updated by competitive Hebbian learning for incremental cluster detection. It performs the vigilance test as in the Adaptive Resonance Theory (ART) to decide if the winning node is close enough to the input signal. Only the two nearest nodes satisfy the vigilance test, their weight vectors are adapted. Otherwise, new node is created and its weight vector is just the data in the input space. This algorithm can learn new input data without degrading the previously trained network and forgetting the old input data.

B. The Proposed Algorithm The whole process can be illustrated in detail as follows

(Fig.2). 1) Initialize the weight vectors of the starting four nodes with random values (Fig 1.a). 2) Input a new data vector v . 3) Determine the winning node that is closest to the input vector mapped to the current feature map, using the Euclidean distance (Fig1.b). This step can be summarized as: find q such that q iv w v w− ≤ − , ,i N∀ ∈ where v , w are the

input and weight vectors, respectively, q is the winner, and N is the set of natural numbers. 4) If the minimum distance is smaller than the similarity threshold ε , i.e., ε≤− qwv , update the winner and its

direct neighbours according to the distance to the input v (Fig1.c):

( )( )i

wwd

i wvrewqi

−=Δ−

2

2

2

,

σ where r is a small constant learning rate, and σ controls the spread of neighbourhood. Usually we set εσ = . 5) If the minimum distance is larger than the similarity thresholdε , i.e., for all nodes 1, , ,i N= … in current network

( ), ,i id w v v w ε= − >

grow new nodes which represent exactly the input v :

neww v= from q if q is a boundary node (Fig1.d). 6) If q is a non-boundary node, search the second-nearest node j to the current input v (Fig1.e). 7) If j is a boundary node, grow new nodes from j as in step 5 (Fig1.f). If j is also a non-boundary node, go back to step 2) and repeat all steps above. 8) Prune weak connections and inactive nodes.

Our model is similar to the GSOM and IGG in that it initializes with four connected nodes and generates new nodes only from the boundary of the network when necessary. The automatic growth of the network is no longer dependent on the accumulated error or the number of input data received by the node, but depends on whether the existing nodes are

1325

(a) (b) (c)

(d) (e)

(f) Figure1

Figure2 The flowchart of the proposed algorithm

capable of representing the current input using the similarity threshold. This growing criterion is identical to the one in ESOM and IGNG. If the existing nodes don’t have the required similarity level, new nodes are added to the network. The weight vector of the new added node is the corresponding input, i.e. neww v= . Connections between nodes are used to maintain the neighbourhood relationships between close nodes. Connection strength of nodes i and j

is dependent on ( )ji wwd , , the distance between their

weight vectors. The smaller the distance, the stronger

connection between the two nodes. Thus a “disconnect” threshold parameter is used to determine whether there are two nodes are connected even though they represent distant data in the input space or not. Exceeding such a threshold indicates that the connection between these two nodes is so weak and should be deleted from the network.

There are two points worth mentioning: one is that the weight adaptation of the nodes is limited to the winner and its direct neighbours. It is time saving compared to other algorithms such as SOM and GSOM which need many iterations to update the weight vectors of nodes in different neighbourhood. The other is that when new nodes grow from the boundary node, whether the winner or the second-nearest node, if a node has already existed in the location of the boundary node’s direct neighbour and there is no connection between it and the boundary node, just link them and update its weight vector to the input v . This situation can be illustrated in Fig1.f.

I. EXPERIMENTAL RESULTS

In this section, we will give some experimental results to demonstrate the general behaviour and performances of the new model. Data sets used in this study include:

A. The iris data [16]. This popular benchmark data set consists of three species

of Iris - Iris setosa, Iris versicolour, and Iris virginica, each of which has 50 flowers with four attributes-petal length, petal width, sepal length and sepal width. In this experiment, this data set is used to demonstrate the usefulness of our algorithm as an unsupervised data mining tool. So we prefer to find whether there is any unforeseen grouping or sub-groupings in the data rather than the traditional expected outcome of the accuracy of the 3 cluster identification. The training process of this experiment is repeated for 20 times with the training samples presented in random order. The attribute values are scaled to the range 0.0 to 1.0 for inputting to the model. The results can be shown in Tab.1. Fig.3 presents one of the generated maps in this experiment and the number denotes the sequence of the node presents.

It can be seen that during the training process, besides the initial four nodes, there are 82 nodes added to the network continuously. Three groupings are visually separated. If we mark the class label on each node, as shown in Fig.4, we can conclude that class setosa has much less nodes than either of the rest classes. It indicates that the density of data in the space of class setosa is considerably higher than the other ones. In other words, the intra-cluster similarity of class setosa is significantly better, which can be verified by Tab.2. Another interesting phenomenon in Fig.4 is that there are three nodes (node 11, 49 and 50) hit by class versicolour as well as class verginica. For example, for node 49, it is hit by data 67, data 85 and data 107. In the original data set, data 67 and data 85 belong to class versicolor while data 107 belongs to class verginica. But according to our algorithm, the Euclidean distance between data 107 and the centroid of class versicolour is smaller than the distance between it and the centroid of class verginica. The other reason data 107 is

1326

mapped to the same node as data 67 and data 85 is that it happens to be on both the edge of space of class verginica and the edge of space of class versicolor. The range of four attritubes of class verginca and class versicolor are [ ] [ ] [ ] [ ]1,5417.01,5932.075.0,0833.01,1667.0 ××× and[ ] [ ] [ ] [ ]0.1667, 0.75 0, 0.5833 0.339, 0.6949 0.375, 0.7083× × ×

data 107: (0.1667 0.2084 0.5932 0.6667)

centroid of versicolour: (0.4544 0.3208 0.5525 0.5108)

centroid of verginica: (0.6356 0.4058 0.7715 0.8025).

TABLE1. 20 TRAINING RESULTS OF THE EXPERIMENTS ON IRIS AND STATLOG (IMAGE SEGMENTATION) DATA SET

Iris Statlog

Number of nodes

Time (s)

Number of nodes

Time (s)

1 79 0.407 372 6.563 2 72 0.407 381 7.281 3 67 0.343 397 7.625 4 78 0.375 366 6.625 5 70 0.406 371 6.829 6 78 0.422 358 6.312 7 66 0.344 365 6.484 8 71 0.359 379 7.157 9 77 0.437 420 7.766 10 82 0.468 367 6.360 11 87 0.437 381 6.266 12 66 0.359 356 5.843 13 74 0.406 383 6.671 14 80 0.437 365 5.984 15 78 0.422 369 6.063 16 75 0.391 373 6.094 17 79 0.469 402 6.859 18 76 0.375 386 7.140 19 77 0.407 382 6.641 20 81 0.453 365 6.094 Mean value

75.65 0.406 376.5 6.633

Standard deviation

5.434 0.0374 15.244 0.529

In Tab.3, the performance of GSOM is given. Even adopting different neighbour radius r and different number of epochs, in most cases, the GSOM divides the whole data set into two groupings: one is the setosa class, and the rest is the union of class versicolour and class verginica. It is very hard to separate these two classes because of their linear inseparability. But the new algorithm achieves better performance in this point, especially for setosa class, the accuracy of recognition can be 100% and verginica has separated into two clusters on either side of veisicolour.

B. The Statlog (Image Segmatation) data set [16] To testify the scalability of this new algorithm, we choose

the Statlog (Image Segmentation) data set [16] which contains 19 attributes and 2310 data entries, 330 data entries per class. We also set epochs=1, 08.0=ε , 0.1r = , disconnected threshold=0.5 and repeat the training process

Figure.3: feature map representation for the iris data with epochs=1, 0.08ε = , 0.1r = , disconnected threshold=0.5.

Fig.4: : setosa; : versicolour; : verginica.

TABLE2. SUMMARY STATISTICS OF THE IRIS DATA

Standard deviation

Sepal

length

Sepal

width

Petal

length

Petal

width

SSE

setosa 0.0969 0.1572 0.0291 0.0442 1.8449

versicolour 0.1419 0.1294 0.0788 0.0816 2.4886

verginica 0.1748 0.1330 0.0926 0.1133 3.4840

1327

TABLE3. THE PERFORMANCE OF GSOM

num of nodes

num of clusters

num of data in

each cluster

epochs=1 epochs=5

epochs=20 epochs=50

r=1 8

3

2/56/92

17

2

50/100

37

2

61/89

44

4

1/49/4/96

r=2 12

4

45/50/0/55

28

4

1/32/49/68

66

9

0/34/24/42/1/22/4/5/18

116

2

50/100

r=3 15

2

52/98

32

2

50/100

71

2

50/100

106

2

55/95

r=4 14

4

50/53/2/45

35

2

51/99

72

2

54/96

116

2

50/100

for 20 times with training samples presented in random order. The results are also be shown in Tab.1. Fig.5 presents one of the generated maps with the class label in this experiment.

It can be seen that class 7 has the least number of corresponding nodes and class 2 is the most concentrated class, which followed by class 6. Class 3 is the most diffusible class which correspond to the biggest SSE value. Actually, node 35, node 52, node 2 ,node 8 and node 7 hold 98.2% entries of the whole class and data 302 which hits node 264 is an outlier of this class. The summary statistics of these 7 classes can be proved in Tab.4.

Table4: summary statistics of the Statlog data

Class1

Class2 Class3 Class4 Class5 Class6 Class7

SSE 49.41 61.42 110.05 82.81 82.75 57.37 51.66

II. CONCLUSION This paper has introduced an incremental extension of the

SOM model, featuring its fast learning ability, simpler growing criteria, easy visualization and better efficiency in weight updating compared with similar models. Experiments on some benchmark data sets show that the proposed algorithm works well for pattern clustering and classification tasks. Applications of our model in text clustering and on-line smart meter readings are under investigated.

REFERENCES [1] J. Han, Data Mining: Concepts and Techniques,(2rd Edition). China

Machine Press.2006. [2] T. Kohonen, Self-Organizing Maps (3rd Edition). Berlin:

Springer-Verlag, 2001. [3] J. Vesanto and E.Alhoniemi, “Clustering of the self-organizing map,”

IEEE Trans. Neural Networks ,vol 11, pp. 586-600, May 2000. [4] H.Liu, T.Eklund, B.Back and H.Vanharanta. “Visual Data Mining:

Using Self-Organizing Maps for Electricity Distribution Regulation.” In Proceedings of International Conference on Digtial Enterprise and Information Systems (DEIS 2011), July20-22, London, UK. Spring CCIS194, pp. 631-646.

[5] D. Alahakoon, S.K. Halgamuge, and B. Srinivasan, “Dynamic self-organizing maps with controlled growth for knowledge discovery,” IEEE Trans. Neural Networks, vol 11, pp. 601-614, May 2000.

[6] B. Fritzke, “Growing Cell Structures-A Self-Organizing Network for Unsupervised and Supervised Learning,” Neural Networks,Vol 7, pp.1441-1460,1994.

[7] T. Martinetz and K. Schulten. “A Nueral-Gas Network Learns Topologies,” Artificial Neural Networks, 397-402. North-Holland, Amsterdam.

[8] B. Fritzke, “A Growing Neural Gas Network Learns Topologies,” in Advances in Neural Information Processing System , pp .625-632, MIT Press, Cambridge MA,1995.

[9] J. Blackmore and R. Miikkulainen, “Incremental grid growing: Encoding high-dimensional structure into a two-dimensional feature map,” University of Texas at Austin, Technical Report, AI92- 192,Austin,TX,1992.

[10] J. Blackmore, “Visualising high dimensional structure with the incremental grid growing neural network,” M.S. thesis, Univ. Texas at Austin, 1995.

[11] D. Deng and N. Kasabov, “ESOM: an algorithm to evolve self-organizing maps from on-line data streams,” in Proc.of IJCNN 00, pp. 3-8, New York.

[12] D. Deng and N.Kasabov, “On-line pattern analysis by evolving self-organizing maps,” Neurocomputing , vol 51, pp. 87-103 ,2003.

[13] Y. Prudent and A. Ennaji, “An incremental Growing Neural Gas Learns Topologies,” in Proc.of International Joint Conf on Neural Networks 2005, pp. 1211-1216.

[14] S. Matharage, D.Alahakoon, J. Rajapakse and Pin Huang, “Fast Growing Self Organizing Map for Text Clustering,” Neural Information Processing, Vol 7063, pp. 406-415, 2011.

[15] Guojin Zhu and Xingyin Zhu, “ The Growing Self-Organizing Map for Clustering Algorithms in Programming Codes,” 2010 International Conference on Artificial Intelligence and Computational Intelligence(AICI).

[16] http://archive.ics.uci.edu/ml/datasets.html. [17] S.V.Verdu, M.O.Garcia and F.J.Garcia Franco , “Characterization and

Identification of Electrical Customers Through the Use of Self-Organizing Maps and Daily Load Parameters.” IEEE Power Systems Conference and Exposition, Vol 2, pp. 899-906, 2004.

1328

Figure.5: feature map representation for the Statlog (Image Segmentation) data set with epochs=1, 08.0=ε , 0.1r = , disconnected threshold=0.5.

: class 1; : class 2; : class 3; : class 4; : class 5; : class 6; : class 7.

1329